NEWAgents can now see video via MCP.Try it now →

    AI Model Hub

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    9,002 models available

    Showing 9851008 of 9,002 models

    Image To Text

    microsoft/trocr-large-handwritten

    204K
    160
    transformers
    Image Text To Text

    datalab-to/chandra-ocr-2

    204K
    292
    transformers
    Text Generation

    humarin/chatgpt_paraphraser_on_T5_base

    204K
    193
    transformers
    Zero Shot Image Classification

    google/siglip2-so400m-patch16-384

    204K
    4
    transformers
    Image Text To Text

    Qwen/Qwen3-VL-235B-A22B-Thinking

    203K
    390
    transformers
    Text Generation

    douyamv/Gemma-4-31B-JANG_4M-CRACK-GGUF

    203K
    155
    Text Generation

    lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit

    202K
    15
    transformers
    Depth Estimation

    depth-anything/Depth-Anything-V2-Large-hf

    202K
    31
    transformers
    Image Text To Text

    huihui-ai/Huihui-Qwen3.5-27B-abliterated

    202K
    120
    transformers
    Token Classification

    kontur-ai/sbert_punc_case_ru

    202K
    39
    transformers
    Text Generation

    Qwen/Qwen2.5-Coder-1.5B-Instruct

    202K
    118
    transformers
    Text To Image

    Qwen/Qwen-Image

    201K
    2,474
    diffusers
    Image To Image

    lightx2v/Qwen-Image-Edit-2511-Lightning

    201K
    426
    diffusers
    Image Text To Text

    lmstudio-community/GLM-4.6V-Flash-MLX-8bit

    201K
    1
    transformers
    Text Generation

    nvidia/NVIDIA-Nemotron-Nano-9B-v2-Japanese

    201K
    135
    transformers
    Image Text To Text

    nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16

    201K
    82
    transformers
    Any To Any

    google/gemma-4-E2B

    201K
    233
    transformers
    Text To Image

    Tencent-Hunyuan/HunyuanDiT-v1.1-Diffusers-Distilled

    201K
    15
    diffusers
    Text Classification

    facebook/roberta-hate-speech-dynabench-r4-target

    200K
    98
    transformers
    Image Text To Text

    google/gemma-4-26B-A4B

    200K
    240
    transformers
    Image Text To Text

    lmstudio-community/GLM-4.6V-Flash-MLX-6bit

    200K
    transformers
    Image Segmentation

    nvidia/segformer-b1-finetuned-ade-512-512

    199K
    15
    transformers
    Text To Speech

    microsoft/VibeVoice-1.5B

    199K
    2,338
    transformers
    Text To Speech

    FunAudioLLM/Fun-CosyVoice3-0.5B-2512

    199K
    525
    42 / 376