NEWWhy single embeddings fail for video.Read the post →

    AI Model Hub

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    9,588 models available

    Showing 82338256 of 9,588 models

    Zero Shot Classification

    knowledgator/gliclass-small-v1.0-init

    66
    5
    transformers
    Zero Shot Classification

    HugC/LLM2CLIP-Llama-3-8B-Instruct-CC-Finetuned

    66
    Visual Question Answering

    byh711/FLODA-deepfake

    66
    peft
    Video Classification

    Nikeytas/videomae-crime-detector-fixed-format

    65
    Unconditional Image Generation

    ceyda/butterfly_cropped_uniq1K_512

    65
    5
    transformers
    Image Feature Extraction

    birder-project/rope_vit_reg4_b14_capi-imagenet21k

    65
    birder
    Image Feature Extraction

    birder-project/hiera_abswin_base_mim

    65
    birder
    Text To Audio

    jadechoghari/openmusic

    65
    73
    diffusers
    Text To Audio

    rnjema-unima/waxal-mms-tts-lug

    65
    transformers
    Text To Audio

    mariammohamed00/speecht5_finetuned

    65
    1
    transformers
    Text To Audio

    N093/final_tts

    65
    transformers
    Zero Shot Classification

    deepnight-research/zsc-text

    65
    transformers
    Image Feature Extraction

    timm/vit_so400m_patch16_siglip_gap_256.v2_webli

    65
    timm
    Document Question Answering

    pardeepSF/layoutlm-vqa

    64
    1
    transformers
    Image Feature Extraction

    m42-health/CXformer-small

    64
    2
    transformers
    Video Classification

    adenhaus/videomae-small-finetuned-kinetics-finetuned-judo

    64
    transformers
    Video Classification

    JaehwiJeon/videomae-base-finetuned-ucf101-subset

    64
    transformers
    Video Classification

    Ayeshara/videomae-base-finetuned-ucf101-subset

    64
    transformers
    Zero Shot Classification

    AXERA-TECH/siglip2-base-patch16-224

    64
    Text To Audio

    olawale-ahmed/pidgin_speecht5_tts_anonxx_pidgin_dataset

    64
    transformers
    Depth Estimation

    simon123905/test0325

    63
    transformers
    Visual Question Answering

    BUAADreamer/Yi-VL-6B-hf

    63
    2
    transformers
    Document Question Answering

    vkrnsn/layoutlmv2-base-uncased_finetuned_docvqa

    63
    transformers
    Unconditional Image Generation

    teohyc/Covid-XRay-Diffusion-Model

    63
    diffusers
    344 / 400