NEWWhy single embeddings fail for video.Read the post →

    AI Model Hub

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    9,588 models available

    Showing 78737896 of 9,588 models

    Text To Video

    AaronHuangWei/Wan2.1-T2V-14B-INT8FakeQuant_pertensor

    103
    diffusers
    Audio Classification

    facebook/mms-lid-512

    103
    2
    transformers
    Audio Classification

    zhudi2825/MuQ-Eval

    103
    pytorch
    Audio Classification

    Kibalama/Speech_Commands_Model

    103
    transformers
    Audio Classification

    JanLilan/distilhubert_finetuned-distilhubert

    103
    transformers
    Text To Audio

    sil-ai/bcc_latn-bcclatn-audio-aligned-speecht5

    103
    transformers
    Audio Classification

    Xenova/mms-lid-126

    103
    transformers.js
    Zero Shot Image Classification

    facebook/metaclip-b32-fullcc2.5b

    103
    9
    transformers
    Object Detection

    HichTala/DiffusionDet

    102
    transformers
    Object Detection

    LovrOP/detraaa_finetuned_cppe5

    102
    1
    transformers
    Object Detection

    Charliesgt/pollencounter_detr_resnet101-dc5

    102
    transformers
    Object Detection

    Charliesgt/pollen_detr_resnet50

    102
    transformers
    Object Detection

    HichTala/diffusiondet-dota

    102
    transformers
    Object Detection

    Hsueh1001/yolo

    102
    transformers
    Image Segmentation

    stevenbucaille/rf-detr-seg-nano

    102
    transformers
    Image Segmentation

    basakozsoy/segformer-b0-finetuned-segments-sidewalk-2

    102
    transformers
    Zero Shot Image Classification

    Xenova/clip-vit-large-patch14-336

    102
    transformers.js
    Audio Classification

    aufklarer/Qwen3-ForcedAligner-0.6B-CoreML-INT8

    102
    2
    Audio Classification

    aliciayrvn/distilhubert-finetuned-gtzan

    102
    transformers
    Audio Classification

    hyunseop/dishubert-finetuned-gtzan

    102
    transformers
    Zero Shot Classification

    knowledgator/comprehend_it-multilingual-t5-base

    102
    27
    transformers
    Object Detection

    mermermer/figpanel-yolov12

    102
    1
    ultralytics
    Audio Classification

    Milana/model-classifier-vctk-edacc

    102
    transformers
    Audio Classification

    DBD-research-group/Wav2Vec2-Base-BirdSet-XCL

    102
    transformers
    329 / 400