NEWWhy single embeddings fail for video.Read the post →

    AI Model Hub

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    9,588 models available

    Showing 81858208 of 9,588 models

    Text To Audio

    LeeAeron/acestep-v15-sft

    72
    transformers
    Text To Audio

    JBZhang2342/speecht5_tts

    72
    transformers
    Text To Video

    alibaba-pai/Wan2.1-Fun-V1.1-14B-InP

    72
    7
    videox_fun
    Depth Estimation

    qualcomm/Depth-Anything

    71
    1
    pytorch
    Visual Question Answering

    openbmb/OmniLMM-12B

    71
    72
    transformers
    Document Question Answering

    AyushPremjith/layoutlmv2-base-uncased_finetuned_docvqa

    71
    transformers
    Image Feature Extraction

    timm/vit_base_patch16_siglip_gap_224.v2_webli

    71
    timm
    Video Classification

    interlive/STRIDE-2B

    71
    transformers
    Text To Audio

    suhaibrashid17/MMS_TTS_Urdu_3

    71
    1
    transformers
    Zero Shot Classification

    digitalepidemiologylab/covid-twitter-bert-v2-mnli

    71
    1
    transformers
    Visual Question Answering

    garlandchou/V-Reflection

    70
    5
    Visual Question Answering

    AXERA-TECH/SmolVLM2-500M-Video-Instruct-python

    70
    2
    Unconditional Image Generation

    google/ncsnpp-ffhq-256

    70
    4
    diffusers
    Unconditional Image Generation

    Adiii143/sd-class-butterflies-32

    70
    diffusers
    Image Feature Extraction

    birder-project/hieradet_d_small_dino-v2

    70
    birder
    Image Feature Extraction

    aimagelab/CoDE

    70
    2
    transformers
    Depth Estimation

    Intel/dpt-beit-large-384

    70
    transformers
    Video Classification

    KingTechnician/videomae-small-finetuned-kinetics-finetuned-xd-violence-multilabel

    70
    transformers
    Video Classification

    CondadosAI/xclip_base_patch32

    70
    transformers
    Depth Estimation

    Xenova/glpn-kitti

    69
    2
    transformers.js
    Unconditional Image Generation

    nappa114514/Qwen-Image-Edit-2511-monochrome-charachange

    69
    4
    diffusers
    Table Question Answering

    microsoft/tapex-large-finetuned-wikisql

    69
    18
    transformers
    Tabular Regression

    SQuADDS/transmon-cross-hamiltonian-inverse

    69
    1
    keras
    Image Feature Extraction

    timm/aimv2_large_patch14_336.apple_pt_dist

    69
    timm
    342 / 400