NEWAgents can now see video via MCP.Try it now →

    Image Text To Text Models

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    400 models available

    Showing 4972 of 400 models

    Image Text To Text

    lightonai/LightOnOCR-2-1B

    841K
    667
    transformers
    Image Text To Text

    pytorch/gemma-3-27b-it-AWQ-INT4

    830K
    7
    transformers
    Image Text To Text

    Qwen/Qwen3-VL-32B-Instruct-FP8

    824K
    44
    transformers
    Image Text To Text

    mlx-community/gemma-3-4b-it-qat-4bit

    815K
    8
    transformers
    Image Text To Text

    cyankiwi/gemma-4-31B-it-AWQ-4bit

    706K
    30
    transformers
    Image Text To Text

    Qwen/Qwen3.5-35B-A3B-GPTQ-Int4

    671K
    79
    transformers
    Image Text To Text

    nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-FP8

    663K
    50
    transformers
    Image Text To Text

    nanonets/Nanonets-OCR2-3B

    660K
    500
    transformers
    Image Text To Text

    llava-hf/llava-v1.6-mistral-7b-hf

    632K
    306
    transformers
    Image Text To Text

    unsloth/Qwen3.5-4B-GGUF

    630K
    223
    transformers
    Image Text To Text

    microsoft/Florence-2-base

    624K
    365
    transformers
    Image Text To Text

    Qwen/Qwen3.5-122B-A10B-FP8

    579K
    93
    transformers
    Image Text To Text

    unsloth/Qwen3.5-27B-GGUF

    557K
    489
    Image Text To Text

    codgician/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GPTQ-int4

    551K
    9
    transformers
    Image Text To Text

    Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2

    550K
    114
    Image Text To Text

    trl-internal-testing/tiny-Qwen2_5_VLForConditionalGeneration

    543K
    transformers
    Image Text To Text

    Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled

    539K
    2,779
    Image Text To Text

    allenai/Molmo2-8B

    530K
    170
    transformers
    Image Text To Text

    deepseek-ai/deepseek-vl2-tiny

    521K
    247
    transformers
    Image Text To Text

    HuggingFaceTB/SmolVLM-256M-Instruct

    519K
    355
    transformers
    Image Text To Text

    nvidia/NVIDIA-Nemotron-Parse-v1.1

    517K
    166
    transformers
    Image Text To Text

    unsloth/Mistral-Small-3.2-24B-Instruct-2506-bnb-4bit

    512K
    10
    vllm
    Image Text To Text

    Qwen/Qwen3.5-397B-A17B

    501K
    1,473
    transformers
    Image Text To Text

    OpenGVLab/InternVL2-1B

    486K
    80
    transformers
    3 / 17