NEWAgents can now see video via MCP.Try it now →

    Image Text To Text Models

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    400 models available

    Showing 241264 of 400 models

    Image Text To Text

    MiniMaxAI/MiniMax-VL-01

    103K
    282
    Image Text To Text

    datalab-to/chandra

    100K
    520
    transformers
    Image Text To Text

    Qwen/Qwen2.5-VL-32B-Instruct

    97K
    482
    transformers
    Image Text To Text

    google/medgemma-1.5-4b-it

    96K
    590
    transformers
    Image Text To Text

    lkhl/VideoLLaMA3-2B-Image-HF

    95K
    transformers
    Image Text To Text

    Qwen/Qwen3-VL-2B-Instruct-FP8

    95K
    38
    transformers
    Image Text To Text

    mlx-community/gemma-3-27b-it-qat-4bit

    93K
    22
    transformers
    Image Text To Text

    m-ric/Aria_hf_2

    93K
    transformers
    Image Text To Text

    lovedheart/Qwen3.5-4B-FP8

    93K
    3
    transformers
    Image Text To Text

    unsloth/gemma-4-26B-A4B-it

    93K
    19
    Image Text To Text

    OpenGVLab/InternVL3-14B

    93K
    79
    transformers
    Image Text To Text

    HuggingFaceTB/SmolVLM-500M-Instruct

    92K
    192
    transformers
    Image Text To Text

    Qwen/Qwen2.5-VL-3B-Instruct-AWQ

    92K
    62
    transformers
    Image Text To Text

    stepfun-ai/GOT-OCR2_0

    91K
    1,533
    Image Text To Text

    cyankiwi/Qwen3-VL-8B-Instruct-AWQ-4bit

    91K
    14
    Image Text To Text

    trl-internal-testing/tiny-Gemma4ForConditionalGeneration

    91K
    transformers
    Image Text To Text

    microsoft/kosmos-2.5

    90K
    270
    transformers
    Image Text To Text

    huihui-ai/Huihui-Qwen3.5-9B-abliterated

    89K
    100
    transformers
    Image Text To Text

    optimum-intel-internal-testing/tiny-random-internvl2

    89K
    Image Text To Text

    nvidia/Cosmos-Reason2-2B

    89K
    68
    cosmos
    Image Text To Text

    baidu/ERNIE-4.5-VL-28B-A3B-PT

    89K
    101
    transformers
    Image Text To Text

    nohurry/gemma-4-26B-A4B-it-heretic-GUFF

    88K
    64
    Image Text To Text

    Qwen/Qwen3-VL-2B-Thinking

    87K
    111
    transformers
    Image Text To Text

    XCurOS/XCurOS-1.2-8B-VLBF16-Instruct

    87K
    2
    transformers
    11 / 17