NEWWhy single embeddings fail for video.Read the post →

    AI Model Hub

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    9,588 models available

    Showing 81618184 of 9,588 models

    Text To Video

    benjamin-paine/vidxtend

    76
    3
    diffusers
    Text To Audio

    Marvis-AI/marvis-tts-250m-v0.1-MLX-fp16

    76
    5
    transformers
    Visual Question Answering

    nectec/Pathumma-llm-vision-1.0.0

    75
    11
    Unconditional Image Generation

    zhuzhu18/sd-class-butterflies-32

    75
    diffusers
    Image Feature Extraction

    timm/samvit_huge_patch16.sa1b

    75
    1
    timm
    Image Feature Extraction

    timm/vit_large_patch14_clip_224.dfn2b

    75
    timm
    Image Feature Extraction

    timm/aimv2_large_patch14_448.apple_pt

    75
    timm
    Video Classification

    Khalil112/videomae-base-finetuned-ucf101-subset

    75
    transformers
    Text To Video

    lightx2v/Wan2.1-T2V-1.3B-longcat-step1500

    75
    7
    diffusers
    Text To Video

    chestnutlzj/Spark-Wan-4Steps

    75
    diffusers
    Voice Activity Detection

    mlx-community/diar_streaming_sortformer_4spk-v2.1-fp32

    74
    1
    mlx-audio
    Visual Question Answering

    prapaa/eastrus-vl-qwen3-8b-gguf

    74
    llama.cpp
    Visual Question Answering

    prapaa/eastrus-vl-qwen3-2b-gguf

    74
    llama.cpp
    Unconditional Image Generation

    adakoda/sd-class-butterflies-64

    74
    diffusers
    Image Feature Extraction

    birder-project/rope_vit_reg4_b14_capi

    74
    birder
    Video Classification

    Babaili/videomae-base-finetuned-ucf101-subset

    74
    transformers
    Text To Video

    wan-community/Wan2.1-T2V-1.3B

    74
    diffusers
    Voice Activity Detection

    aufklarer/FireRedVAD-CoreML

    73
    Text To Audio

    waxal-benchmarking/mms-tts-luo-3mry5

    73
    transformers
    Zero Shot Classification

    sjrhuschlee/flan-t5-base-mnli

    73
    2
    transformers
    Depth Estimation

    depth-anything/prompt-depth-anything-vits-transparent-hf

    72
    Video Classification

    mitegvg/videomae-tiny-92-kinetics-binary-finetuned-xd-violence

    72
    transformers
    Video Classification

    StreamFormer/streamformer-timesformer

    72
    4
    Image Feature Extraction

    lukeingawesome/TILA

    72
    pytorch
    341 / 400