NEWWhy single embeddings fail for video.Read the post →

    AI Model Hub

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    9,588 models available

    Showing 72257248 of 9,588 models

    Image To Text

    Graf-J/captcha-crnn-finetuned

    172
    1
    transformers
    Image To Text

    mradermacher/Perseus-Doc-vl-0712-GGUF

    172
    1
    transformers
    Image To Text

    sbintuitions/sarashina2-vision-14b

    172
    12
    transformers
    Image Feature Extraction

    r3gm/controlnet-openpose-sdxl-1.0-fp16

    172
    diffusers
    Audio Classification

    012shin/KAIROS-ast-fake-audio-detection

    172
    transformers
    Question Answering

    WangCa/Qwen2.5-7B-Medicine

    172
    3
    Question Answering

    wieheistdu/distilbert-base-uncased-finetuned-emrQA-msquad

    172
    transformers
    Audio To Audio

    Mungert/LFM2.5-Audio-1.5B-GGUF

    171
    liquid-audio
    Visual Question Answering

    Jesteban247/brats_medgemma-GGUF

    171
    transformers
    Visual Question Answering

    erax-ai/EraX-VL-2B-V1.5

    171
    10
    transformers
    Visual Question Answering

    Dmjdxb/deplot

    171
    Object Detection

    keremberke/yolov5m-football

    171
    4
    yolov5
    Object Detection

    omlab/VLM-FO1_Qwen2.5-VL-3B-v01

    171
    13
    Image Segmentation

    Dnq2025/mask2former-finetuned-ER-Mito-LD6

    171
    transformers
    Image Segmentation

    qualcomm/MediaPipe-Selfie-Segmentation

    171
    9
    pytorch
    Image To Text

    mradermacher/DREX-062225-exp-GGUF

    171
    2
    transformers
    Image Feature Extraction

    Snarcy/OmniRad-small

    171
    1
    timm
    Audio Classification

    DL-Project/hatespeech_wav2vec2

    171
    transformers
    Video Classification

    marcm07/VideoMAE-AUTSL-16frames

    171
    transformers
    Question Answering

    ModelTC/bert-base-squad2

    171
    transformers
    Depth Estimation

    Xenova/depth-anything-base-hf

    170
    transformers.js
    Audio To Audio

    JorisCos/ConvTasNet_Libri3Mix_sepnoisy_8k

    170
    2
    asteroid
    Image Segmentation

    Dojofdd/DANet-Fracture-Segmentation

    170
    Image Segmentation

    facebook/sapiens-seg-0.6b-torchscript

    170
    sapiens
    302 / 400