NEWWhy single embeddings fail for video.Read the post →

    AI Model Hub

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    9,588 models available

    Showing 82818304 of 9,588 models

    Video Classification

    archit11/videomae-base-finetuned-fight-nofight-subset

    61
    transformers
    Video Classification

    Graziela/videomae-base-finetuned-ucf101-subset

    61
    transformers
    Text To Audio

    Marvis-AI/marvis-tts-250m-v0.2-MLX-6bit

    61
    3
    transformers
    Video Classification

    SVECTOR-CORPORATION/FAL

    61
    9
    Depth Estimation

    facebook/dpt-dinov2-giant-kitti

    60
    transformers
    Visual Question Answering

    flyingfishinwater/Qwen3.5-2B-MedVL-MLX-4bit

    60
    Unconditional Image Generation

    V3nator/mar_test2

    60
    diffusers
    Unconditional Image Generation

    Filip5050/sd-diffusers-butterflies-32px

    60
    diffusers
    Image Feature Extraction

    timm/vit_large_patch16_siglip_gap_512.v2_webli

    60
    timm
    Video Classification

    Abdullah1/videomae-base-finetuned-kinetics-finetuned-dcsass-shoplifting-subset

    60
    transformers
    Text To Audio

    rhymeswithlion/MIDI-LLM_Llama-3.2-1B-Q4_K_M-GGUF

    60
    1
    transformers
    Text To Audio

    rnjema101/waxal-ibo

    60
    transformers
    Text To Audio

    KGSAGAR/speecht5_finetuned_voxpopuli_es

    60
    transformers
    Zero Shot Classification

    ilos-vigil/bigbird-small-indonesian-nli

    60
    4
    transformers
    Depth Estimation

    Xenova/dpt-large

    59
    transformers.js
    Voice Activity Detection

    mlx-community/diar_sortformer_4spk-v1-fp16

    59
    mlx-audio
    Document Question Answering

    YuukiAsuna/VieTable-donut-docvqa-demo

    59
    1
    transformers
    Image Feature Extraction

    apple/aimv2-3B-patch14-224

    59
    4
    transformers
    Image Feature Extraction

    nvidia/PS3-1.5K-SigLIP2

    59
    2
    Video Classification

    KiraFenvy/videomae-base-finetuned-ucf101-subset

    59
    transformers
    Video Classification

    JinliBot7/videomae-base-finetuned-ucf101-subset

    59
    transformers
    Video Classification

    JIGNESHS110/videomae-base-finetuned-ucf101-subset

    59
    transformers
    Text To Audio

    mirza234/speecht5_finetuned_emirhan_tr

    59
    transformers
    Image Feature Extraction

    MiniMaxAI/VTP-Base-f16d64

    58
    20
    transformers
    346 / 400