NEWWhy single embeddings fail for video.Read the post →

    AI Model Hub

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    9,588 models available

    Showing 73937416 of 9,588 models

    Zero Shot Image Classification

    wisdomik/QuiltNet-B-16-PMB

    157
    5
    open_clip
    Text To Video

    alibaba-pai/Wan2.1-Fun-V1.1-1.3B-Control

    157
    12
    videox_fun
    Audio Classification

    dariowsz/wav2vec2-base-finetuned-gtzan

    157
    transformers
    Audio Classification

    chzzk/audio_cls_hhd

    157
    transformers
    Image To Text

    acul3/LightOnOCR-2-1B-ExecuTorch

    157
    1
    executorch
    Object Detection

    AmineSam/irail-crowd-counting-yolov8n

    157
    ultralytics
    Document Question Answering

    AntonioTH/Layout-finetuned-fr-model-107instances107-150epochs-5e-05lr-GPU

    156
    transformers
    Object Detection

    dariacuna/rtdetr-v2-r50-finetune-6

    156
    transformers
    Image Segmentation

    smp-hub/segformer-b5-1024x1024-city-160k

    156
    segmentation-models-pytorch
    Image Segmentation

    DiTo97/binarization-segformer-b3

    156
    1
    transformers
    Image To Text

    Xenova/trocr-base-handwritten

    156
    4
    transformers.js
    Document Question Answering

    AntonioTH/Layout-finetuned-fr-model-50instances20-100epochs-5e-05lr-GPU

    155
    transformers
    Object Detection

    Anurag277/detr-resnet-50_finetuned_cppe5

    155
    transformers
    Object Detection

    dariacuna/rtdetr-v2-r50-finetune-15

    155
    transformers
    Image Feature Extraction

    PIA-SPACE-LAB/dinov3-vit7b16-pretrain-lvd1689m

    155
    transformers
    Text To Video

    htdong/Wan-Alpha-v2.0

    155
    9
    diffusers
    Audio Classification

    Hatman/audio-emotion-detection

    155
    15
    transformers
    Audio Classification

    pollner/distilhubert-finetuned-ravdess

    155
    2
    transformers
    Audio Classification

    0xb1/wav2vec2-base-finetuned-speech_commands-v0.02

    155
    transformers
    Object Detection

    JohnieWalkerLV/osm-bunker-detector

    155
    ultralytics
    Depth Estimation

    facebook/sapiens-depth-1b-torchscript

    154
    1
    sapiens
    Text To Audio

    Anujgr8/speecht5_code_switch_intra

    154
    transformers
    Unconditional Image Generation

    nroggendorff/cats

    154
    diffusers
    Object Detection

    nsugianto/tblstructrecog_finetuned_detresnet_v2_s1_311s

    154
    transformers
    309 / 400