NEWWhy single embeddings fail for video.Read the post →

    AI Model Hub

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    9,588 models available

    Showing 81138136 of 9,588 models

    Object Detection

    negi3961/factory-defect-guard

    83
    ultralytics
    Text To Video

    nagayama0706/video_generation_model

    83
    8
    transformers
    Video Classification

    MCG-NJU/videomae-base-short-finetuned-kinetics

    82
    3
    transformers
    Visual Question Answering

    HPAI-BSC/Aloe-Vision-7B-AR

    82
    1
    Table Question Answering

    microsoft/tapex-large-sql-execution

    82
    18
    transformers
    Object Detection

    Punn1403/detr_finetuned_bccd

    82
    transformers
    Image Feature Extraction

    inclusionAI/MingTok-Vision

    82
    32
    transformers
    Video Classification

    Dinh/videomae-small-finetuned-kinetics-finetuned-action

    82
    2
    transformers
    Video Classification

    Dijaaa/videomae-base-finetuned-kinetics-finetuned-ucf-crime-subset

    82
    transformers
    Video Classification

    LouisDT/videomae-base-finetuned

    82
    transformers
    Zero Shot Classification

    Keetawan/clip-vit-large-patch14-plant-disease-finetuned

    82
    2
    Object Detection

    unity/inference-engine-yolo

    82
    27
    unity-sentis
    Depth Estimation

    facebook/dpt-dinov2-base-kitti

    81
    2
    transformers
    Voice Activity Detection

    mlx-community/diar_sortformer_4spk-v1-fp32

    81
    mlx-audio
    Visual Question Answering

    AXERA-TECH/InternVL3-2B

    81
    2
    Image Feature Extraction

    gingyin/TTPLanet_SDXL_Controlnet_Tile_Realistic

    81
    diffusers
    Video Classification

    Dijaaa/output_dir

    81
    transformers
    Text To Video

    sharonSD/Wan2.1-T2V-14B

    81
    diffusers
    Visual Question Answering

    Cylingo/Xinyuan-VL-2B

    80
    7
    transformers
    Text To Video

    JCTN/AnimateDiff-Lightning

    80
    6
    diffusers
    Text To Audio

    sil-ai/nya-NYJBIBN-audio-speecht5

    80
    transformers
    Voice Activity Detection

    cstr/marblenet-vad-GGUF

    80
    ggml
    Visual Question Answering

    internlm/internlm-xcomposer2-vl-1_8b

    79
    18
    transformers
    Visual Question Answering

    google/pix2struct-ocrvqa-base

    79
    5
    transformers
    339 / 400