NEWWhy single embeddings fail for video.Read the post →

    AI Model Hub

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    9,588 models available

    Showing 75857608 of 9,588 models

    Audio Classification

    xbgoose/wavlm-base-speech-emotion-recognition-russian-dusha-finetuned

    130
    1
    transformers
    Audio Classification

    Beijuka/voice-gender-classifier

    130
    Audio Classification

    Discidius/Speech-Emotion-Classification

    130
    transformers
    Video Classification

    pavitemple/videomae-base-finetuned-Accident-Video-subset

    129
    2
    transformers
    Object Detection

    keremberke/yolov5s-blood-cell

    129
    2
    yolov5
    Object Detection

    keremberke/yolov5n-nfl

    129
    2
    yolov5
    Zero Shot Image Classification

    redlessone/DermLIP_PanDerm-base-w-PubMed-256

    129
    11
    transformers
    Zero Shot Image Classification

    apple/TiC-CLIP-bestpool-sequential

    129
    1
    tic-clip
    Image Feature Extraction

    timm/eva02_large_patch14_224.mim_m38m

    129
    timm
    Text To Video

    rhymes-ai/Allegro

    129
    264
    diffusers
    Video Classification

    XenXeon/videomae-base-finetuned-slrbd001

    128
    transformers
    Video Classification

    Shawon16/videoMAE_base_wlasl_100_40ep_coR_p10

    128
    transformers
    Object Detection

    keremberke/yolov5s-football

    128
    3
    yolov5
    Object Detection

    Kelex83/finetuned-detr-resnet-50-dc5-fashionpedia

    128
    transformers
    Image Segmentation

    tobiasc/segformer-b0-finetuned-segments-sidewalk

    128
    1
    transformers
    Image Feature Extraction

    gwkrsrch/siglip2-so400m-patch16-384

    128
    transformers
    Image Feature Extraction

    apple/aimv2-huge-patch14-224

    128
    13
    transformers
    Audio Classification

    ntua-slp/CultureMERT-95M

    128
    transformers
    Object Detection

    mlx-community/YOLO26m-OptiQ-6bit

    128
    mlx
    Zero Shot Image Classification

    FreddyFazbear0209/CLIP_for_visual_recognition

    128
    transformers
    Video Classification

    XenXeon/videomae-base-finetuned-slrbd002

    127
    transformers
    Visual Question Answering

    DAMO-NLP-SG/VideoLLaMA3-2B-Image

    127
    8
    transformers
    Visual Question Answering

    jihadzakki/blip1-medvqa

    127
    2
    transformers
    Document Question Answering

    rubentito/layoutlmv3-base-mpdocvqa

    127
    10
    transformers
    317 / 400