NEWWhy single embeddings fail for video.Read the post →

    AI Model Hub

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    9,588 models available

    Showing 83538376 of 9,588 models

    Document Question Answering

    Lynxlave/layoutlmv2-base-uncased_finetuned_docvqa

    53
    transformers
    Unconditional Image Generation

    google/ncsnpp-ffhq-1024

    53
    12
    diffusers
    Unconditional Image Generation

    irisdri/sd-class-butterflies-32

    53
    diffusers
    Tabular Classification

    M-Ahmad-Abid/Diabetes_model

    53
    transformers
    Image Feature Extraction

    timm/beit3_base_patch16_224.pt

    53
    timm
    Image Feature Extraction

    lb-sage/VL3-SigLIP-NaViT

    53
    1
    transformers
    Image Feature Extraction

    canvit/canvitb16-add-vpe-pretrain-g128px-s512px-in21k-dv3b16-2026-02-02-mlx

    53
    mlx
    Depth Estimation

    Jens-Duttke/DepthPro-ONNX-HighPerf

    53
    1
    onnxruntime
    Video Classification

    Natali12/videomae-base-finetuned-opportunity-locomotion

    53
    transformers
    Video Classification

    Hibernates/Hibernates-MEA-R2-V0

    53
    2
    transformers
    Video Classification

    Jeyseb/videomae-base-finetuned-rwf2000-subset

    53
    transformers
    Text To Audio

    truong-xuan-linh/speecht5-vietnamese-voiceclone-lsvsc

    53
    1
    transformers
    Text To Audio

    LeeAeron/Ace-Step1.5

    53
    transformers
    Zero Shot Classification

    typeform/squeezebert-mnli

    53
    4
    transformers
    Depth Estimation

    facebook/dpt-dinov2-large-kitti

    52
    4
    transformers
    Voice Activity Detection

    KIFF/pyannote-speaker-diarization-endpoint

    52
    4
    pyannote-audio
    Tabular Classification

    emergentphysicslab/waveguard-anomaly-detector

    52
    1
    waveguard
    Image Feature Extraction

    r3gm/controlnet-openpose-twins-sdxl-1.0-fp16

    52
    diffusers
    Image Feature Extraction

    timm/vit_so400m_patch14_siglip_gap_384.webli

    52
    timm
    Depth Estimation

    coarse-corpse1/DepthAnything-Mine

    52
    Video Classification

    gullalc/videomae-base-finetuned-kinetics-movieshots-scale

    52
    transformers
    Video Classification

    Ptisni/videomae-base-finetuned-ucf101-subset

    52
    transformers
    Video Classification

    Ptisni/videomae-base-finetuned-kinetics-finetuned-ucf101-subset

    52
    transformers
    Text To Audio

    Herry2015/Ace-Step1.5

    52
    transformers
    349 / 400