NEWAgents can now see video via MCP.Try it now →

    AI Model Hub

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    9,588 models available

    Showing 61936216 of 9,588 models

    Text To Audio

    forkjoin-ai/vibevoice-1.5b

    549
    llama-cpp
    Image To Text

    fhswf/TrOCR_german_handwritten

    549
    13
    transformers
    Visual Question Answering

    Swicked86/phi4-mm-gguf

    547
    3
    gguf
    Object Detection

    Yagofue/yolo_finetuned_raccoon

    546
    transformers
    Image To Text

    EZCon/GLM-OCR-8bit-mlx

    546
    1
    mlx
    Reinforcement Learning

    mradermacher/PRIMO-COT-SFT-7B-GGUF

    545
    1
    transformers
    Image To Text

    livadies/gemma-4-31B-Ghetto-NF4

    545
    5
    transformers
    Image To Text

    thwri/CogFlorence-2.1-Large

    545
    28
    transformers
    Audio Classification

    SeaBenSea/hubert-large-turkish-speech-emotion-recognition

    545
    3
    transformers
    Question Answering

    shay681/HeBERT_finetuned_Legal_Clauses

    544
    Question Answering

    uclanlp/visualbert-vqa

    544
    4
    transformers
    Text To Audio

    scragnog/ace-step-1.5-gguf-merge-models

    544
    2
    gguf
    Image Segmentation

    EPFL-ECEO/segformer-b5-finetuned-coralscapes-1024-1024

    544
    transformers
    Image Segmentation

    shehan97/mobilevitv2-1.0-voc-deeplabv3

    543
    transformers
    Audio Classification

    jananiramaseshan/ast-music-genre-classifier

    543
    transformers
    Object Detection

    jozhang97/deta-swin-large

    542
    19
    transformers
    Zero Shot Image Classification

    imageomics/biocap

    540
    open_clip
    Image Feature Extraction

    facebook/dinov3-vit7b16-pretrain-sat493m

    540
    39
    transformers
    Object Detection

    sch-ai/detr-hotspot

    539
    transformers
    Audio Classification

    tiantiaf/wavlm-large-msp-podcast-emotion-dim

    538
    5
    Text To Video

    BAAI/URSA-1.7B-FSQ320

    537
    9
    diffusers
    Audio To Audio

    nvidia/bigvgan_24khz_100band

    537
    4
    PyTorch
    Image To Text

    PaddlePaddle/RT-DETR-H_layout_3cls

    537
    PaddleOCR
    Image Segmentation

    yolo12138/segformer-b2-human-parse-24

    536
    12
    transformers
    259 / 400