NEWWhy single embeddings fail for video.Read the post →

    AI Model Hub

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    9,588 models available

    Showing 72017224 of 9,588 models

    Image Segmentation

    canvit/probe-ade20k-40k-s512-c8-in21k

    175
    canvit-pytorch
    Image To Text

    Ertugrul/Qwen2-VL-7B-Captioner-Relaxed

    175
    64
    transformers
    Image To Text

    CelesteImperia/Qwen2-VL-2B-Instruct-Platinum-GGUF

    175
    gguf
    Image To Text

    PaddlePaddle/PP-DocBee2-3B

    175
    PaddleOCR
    Audio Classification

    Bisher/wav2vec2_ASV_deepfake_audio_detection

    175
    1
    transformers
    Question Answering

    nphearum/Qwen3.5-4B-khmer-delta

    175
    adapter-transformers
    Question Answering

    ModelTC/bert-base-squad

    175
    transformers
    Object Detection

    0llheaven/Conditional-detr-finetuned-V5

    174
    transformers
    Image Segmentation

    stevenbucaille/rf-detr-seg-small

    174
    transformers
    Image To Text

    TainU/RePlan-Qwen2.5-VL-7B

    174
    13
    transformers
    Image To Text

    noctrex/Pixtral-12B-Captioner-Relaxed-GGUF

    174
    Text To Video

    a-r-r-o-w/LTX-Video-0.9.1-diffusers

    174
    8
    diffusers
    Audio Classification

    saurabhati/DASS_small_AudioSet_50.1

    174
    transformers
    Zero Shot Image Classification

    vesteinn/clip-nabirds

    174
    transformers
    Image To Text

    PaddlePaddle/PP-Chart2Table_safetensors

    174
    1
    PaddleOCR
    Object Detection

    mshamrai/yolov8x-visdrone

    173
    15
    ultralytics
    Object Detection

    ARG-NCTU/detr-resnet-50-finetuned-federated-3-clients-intern_annotated_ball_dataset

    173
    transformers
    Image Segmentation

    Dnq2025/mask2former-finetuned-ER-Mito-LD5

    173
    transformers
    Image Segmentation

    wanglab/MedSAM2

    173
    38
    torch
    Zero Shot Image Classification

    visheratin/nllb-siglip-mrl-base

    173
    11
    open_clip
    Image Feature Extraction

    timm/convnextv2_femto.fcmae

    173
    timm
    Audio To Audio

    HiDolen/Mini-BS-RoFormer-V2-46.8M

    172
    4
    transformers
    Visual Question Answering

    google/pix2struct-chartqa-base

    172
    10
    transformers
    Image Segmentation

    Dnq2025/mask2former-finetuned-ER-Mito-LD3

    172
    transformers
    301 / 400