NEWWhy single embeddings fail for video.Read the post →

    AI Model Hub

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    9,588 models available

    Showing 78017824 of 9,588 models

    Audio Classification

    alefiury/wav2vec2-xls-r-300m-pt-br-spontaneous-speech-emotion-recognition

    110
    8
    transformers
    Zero Shot Image Classification

    Green-Sky/FaRL-Base-Patch16-LAIONFace20M-ep64

    110
    transformers
    Text To Audio

    Matthijs/mms-tts-eng

    110
    5
    transformers
    Visual Question Answering

    google/matcha-chart2text-statista

    109
    10
    transformers
    Object Detection

    nsugianto/detr-resnet50_finetuned_detrresnet50_lsdocelementdetv1type7_v2_1669s

    109
    transformers
    Object Detection

    WoIrd/detr-fashionpedia

    109
    transformers
    Image Segmentation

    qualcomm/MobileSam

    109
    7
    pytorch
    Text To Video

    lightx2v/Hy1.5-Distill-Models

    109
    29
    diffusers
    Audio Classification

    LaurenGurgiolo/Music_by_Emotion

    109
    transformers
    Text To Audio

    jongwooko/Flex-Omni-7B

    109
    2
    transformers
    Zero Shot Image Classification

    rollenso/siglip-synthetic-hq-retrieval-v1

    109
    Audio Classification

    Josh9281/fine-finetuned-gtzan-finetuned-gtzan

    109
    transformers
    Audio Classification

    tiantiaf/voxlect-mandarin-cantonese-dialect-whisper-large-v3

    109
    1
    transformers
    Object Detection

    keremberke/yolov5s-forklift

    108
    1
    yolov5
    Image Segmentation

    shi-labs/oneformer_coco_dinat_large

    108
    8
    transformers
    Image Segmentation

    Yuto2007/segformer_cuoio

    108
    segmentation-models-pytorch
    Audio Classification

    Xenova/mms-lid-1024

    108
    transformers.js
    Audio Classification

    afloven/messymashupclassifier

    108
    Object Detection

    Hibou-Foundation/rtdetr-drone-detection

    108
    transformers
    Zero Shot Image Classification

    timm/vit_gigantic_patch14_clip_224.metaclip2_worldwide

    108
    1
    open_clip
    Video Classification

    nateraw/videomae-base-finetuned-ucf101-subset

    107
    1
    transformers
    Audio To Audio

    SPRINGLab/EZ-VC

    107
    5
    f5-tts
    Visual Question Answering

    CHELSEA234/llava-v1.5-7b-M2F2-Det

    107
    1
    Object Detection

    lolodel/detr-fashionpedia

    107
    transformers
    326 / 400