NEWAgents can now see video via MCP.Try it now →

    AI Model Hub

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    9,002 models available

    Showing 481504 of 9,002 models

    Feature Extraction

    unslothai/vram-16

    552K
    transformers
    Audio Classification

    speechbrain/emotion-recognition-wav2vec2-IEMOCAP

    552K
    184
    speechbrain
    Token Classification

    Davlan/xlm-roberta-large-ner-hrl

    551K
    13
    transformers
    Audio To Audio

    nvidia/personaplex-7b-v1

    546K
    2,468
    moshi
    Text Generation

    Qwen/Qwen3-1.7B-Base

    543K
    72
    transformers
    Text Generation

    moonshotai/Kimi-K2-Instruct-0905

    542K
    704
    transformers
    Image Text To Text

    HuggingFaceTB/SmolVLM-256M-Instruct

    542K
    355
    transformers
    Image Feature Extraction

    timm/vit_small_patch14_reg4_dinov2.lvd142m

    540K
    7
    timm
    Feature Extraction

    indobenchmark/indobert-base-p1

    536K
    46
    transformers
    Image Classification

    timm/vit_base_patch16_224.augreg2_in21k_ft_in1k

    535K
    13
    timm
    Image Text To Text

    unsloth/Qwen3.5-27B-GGUF

    534K
    488
    Text Generation

    mlx-community/Kimi-K2.5

    531K
    34
    mlx
    Text Generation

    google/gemma-3-1b-it

    531K
    935
    transformers
    Image Text To Text

    allenai/Molmo2-8B

    530K
    171
    transformers
    Image Classification

    microsoft/resnet-18

    528K
    65
    transformers
    Feature Extraction

    kyutai/mimi

    528K
    299
    transformers
    Text To Image

    CompVis/stable-diffusion-v1-4

    525K
    7,003
    diffusers
    Image Text To Text

    deepseek-ai/deepseek-vl2-tiny

    525K
    247
    transformers
    Text Generation

    Qwen/Qwen3-8B-Base

    524K
    99
    transformers
    Text Generation

    farbodtavakkoli/OTel-LLM-270M-IT

    523K
    Image Classification

    timm/test_resnet.r160_in1k

    523K
    timm
    Audio Classification

    MIT/ast-finetuned-audioset-10-10-0.4593

    521K
    352
    transformers
    Zero Shot Image Classification

    facebook/PE-Core-L14-336

    521K
    52
    perception-encoder
    Text Generation

    bartowski/Qwen2.5-Coder-7B-Instruct-GGUF

    519K
    46
    transformers
    21 / 376