NEWAgents can now see video via MCP.Try it now →

    AI Model Hub

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    9,588 models available

    Showing 66496672 of 9,588 models

    Visual Question Answering

    DAMO-NLP-SG/VideoLLaMA2-7B

    314
    42
    transformers
    Image To Text

    RepublicOfKorokke/GLM-OCR-oQ8-fp16

    314
    mlx
    Robotics

    Zhoues/RoboRefer-8B-SFT

    313
    1
    transformers
    Image Segmentation

    onnx-community/BiRefNet-ONNX

    313
    13
    transformers.js
    Audio Classification

    syamaner/coffee-first-crack-detection

    312
    transformers
    Text To Video

    Searchium-ai/clip4clip-webvid150k

    311
    44
    transformers
    Image To Text

    RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf

    311
    transformers
    Text To Video

    Shaunnotshwn/SkyReels-V2-T2V-14B-540P-GGUF

    311
    gguf
    Audio To Audio

    lucadellalib/focalcodec_50hz_2k_causal

    310
    torch
    Reinforcement Learning

    edbeeching/decision-transformer-gym-hopper-expert

    310
    19
    transformers
    Object Detection

    ustc-community/dfine-large-obj365

    310
    2
    transformers
    Zero Shot Image Classification

    Bingsu/clip-vit-large-patch14-ko

    310
    17
    transformers
    Text To Audio

    AEmotionStudio/stable-audio-open-models

    309
    1
    diffusers
    Audio To Audio

    ktvoice/Codec

    309
    Audio Classification

    DBD-research-group/Bird-MAE-Base

    309
    transformers
    Audio Classification

    abhishtagatya/wavlm-base-960h-itw-deepfake

    309
    transformers
    Image Segmentation

    apple/coreml-detr-semantic-segmentation

    308
    32
    coreml
    Image To Text

    Z3NN001/gemma-4-21b-a4b-it-REAP-mlx-bfloat16

    308
    mlx
    Visual Question Answering

    OpenMed/Qwen3.5-2B-MedVL

    307
    6
    Zero Shot Image Classification

    timm/eva_giant_patch14_clip_224.laion400m_s11b_b41k

    307
    1
    open_clip
    Image Feature Extraction

    tiiuae/siglino-30M

    307
    6
    transformers
    Image To Text

    xingxm/HiVG-3B-Base

    307
    5
    transformers
    Video Classification

    google/videoprism-large-f8r288

    306
    20
    videoprism
    Image Feature Extraction

    gaunernst/vit_small_patch8_gap_112.cosface_ms1mv3

    306
    2
    timm
    278 / 400