NEWAgents can now see video via MCP.Try it now →

    Image Text To Text Models

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    400 models available

    Showing 121144 of 400 models

    Image Text To Text

    dengcao/GLM-4.1V-9B-Thinking-AWQ

    267K
    1
    transformers
    Image Text To Text

    unsloth/Qwen3.5-9B

    265K
    15
    transformers
    Image Text To Text

    allenai/olmOCR-2-7B-1025-FP8

    258K
    227
    transformers
    Image Text To Text

    Qwen/Qwen3.6-27B

    258K
    789
    transformers
    Image Text To Text

    unsloth/Qwen3.5-122B-A10B-GGUF

    258K
    250
    transformers
    Image Text To Text

    moonshotai/Kimi-VL-A3B-Instruct

    252K
    258
    transformers
    Image Text To Text

    bartowski/Qwen_Qwen3.5-0.8B-GGUF

    251K
    13
    Image Text To Text

    trl-internal-testing/tiny-Qwen3_5ForConditionalGeneration

    243K
    transformers
    Image Text To Text

    Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF

    242K
    313
    Image Text To Text

    google/medgemma-4b-it

    237K
    949
    transformers
    Image Text To Text

    Qwen/Qwen2.5-VL-7B-Instruct-AWQ

    236K
    103
    transformers
    Image Text To Text

    llava-hf/llava-interleave-qwen-0.5b-hf

    225K
    36
    transformers
    Image Text To Text

    cyankiwi/Qwen3.5-4B-AWQ-4bit

    224K
    14
    transformers
    Image Text To Text

    OpenGVLab/InternVL3-1B-hf

    215K
    10
    transformers
    Image Text To Text

    OpenGVLab/InternVL3_5-1B-Instruct

    213K
    7
    transformers
    Image Text To Text

    Qwen/Qwen3-VL-235B-A22B-Thinking

    212K
    389
    transformers
    Image Text To Text

    kakaocorp/kanana-1.5-v-3b-instruct

    212K
    53
    transformers
    Image Text To Text

    bartowski/google_gemma-4-E4B-it-GGUF

    210K
    49
    Image Text To Text

    allenai/olmOCR-2-7B-1025

    209K
    143
    transformers
    Image Text To Text

    lmstudio-community/GLM-4.6V-Flash-MLX-4bit

    208K
    2
    transformers
    Image Text To Text

    cyankiwi/Qwen3.5-9B-AWQ-4bit

    203K
    24
    transformers
    Image Text To Text

    lmstudio-community/GLM-4.6V-Flash-MLX-8bit

    202K
    1
    transformers
    Image Text To Text

    lmstudio-community/GLM-4.6V-Flash-MLX-6bit

    201K
    transformers
    Image Text To Text

    PaddlePaddle/PaddleOCR-VL-1.5

    200K
    591
    PaddleOCR
    6 / 17