NEWWhy single embeddings fail for video.Read the post →

    AI Model Hub

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    9,588 models available

    Showing 81378160 of 9,588 models

    Visual Question Answering

    microsoft/git-large-textvqa

    79
    6
    transformers
    Visual Question Answering

    lhzzzzzy/HiSpatial-3B

    79
    Tabular Classification

    cubis/mbta-track-predictor

    79
    keras
    Image Feature Extraction

    facebook/webssl-mae1b-full2b-224

    79
    transformers
    Video Classification

    Dijaaa/UCF-output

    79
    transformers
    Video Classification

    VINAY-UMRETHE/SigMamba-V1-Small

    79
    transformers
    Text To Audio

    KhaledLakhdher/khaledlakhdher_TTS

    79
    transformers
    Text To Video

    AlekseyCalvin/SOTS_Art_Wan1.3B_LoRA_rank256_bySilverAgePoets

    79
    1
    diffusers
    Video Classification

    qualcomm/ResNet-Mixed-Convolution

    78
    1
    pytorch
    Visual Question Answering

    HPAI-BSC/Aloe-Vision-72B-AR

    78
    Tabular Regression

    Dijo-404/mhd-nanofluid-ev-thermal-surrogate

    78
    pytorch
    Video Classification

    pavitemple/finetuned-Accident-MultipleLabels-Video-subset-v2-new

    78
    transformers
    Video Classification

    Dijaaa/videomae-base-finetuned-ucf-crime2

    78
    transformers
    Video Classification

    Babaili/videomae-base-finetuned-cropped-shooting-and-layup-dataset

    78
    transformers
    Text To Video

    showlab/show-1-base

    78
    15
    diffusers
    Text To Video

    BarleyFarmer/creami-lora

    78
    diffusers
    Video Classification

    MHRDYN7/videoprism-base-f16r288-finetuned-ucf101

    77
    transformers
    Zero Shot Classification

    protectai/fmops-distilbert-prompt-injection-onnx

    77
    transformers
    Visual Question Answering

    jzsues/llava-qwen1.5-4b-chat

    77
    transformers
    Image Feature Extraction

    timm/vit_base_patch16_siglip_gap_256.v2_webli

    77
    1
    timm
    Visual Question Answering

    google/pix2struct-screen2words-base

    76
    25
    transformers
    Depth Estimation

    hf-tiny-model-private/tiny-random-GLPNForDepthEstimation

    76
    transformers
    Video Classification

    Luke537/videomae-base-finetuned-ucf101-subset

    76
    transformers
    Text To Audio

    AXERA-TECH/kokoro.axera

    76
    340 / 400