NEWAgents can now see video via MCP.Try it now →

    Image Text To Text Models

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    400 models available

    Showing 265288 of 400 models

    Image Text To Text

    AIDC-AI/Ovis2.5-2B

    87K
    200
    transformers
    Image Text To Text

    meta-llama/Llama-Guard-4-12B

    86K
    91
    transformers
    Image Text To Text

    unsloth/gemma-3-4b-it-GGUF

    86K
    187
    transformers
    Image Text To Text

    YannQi/R-4B

    85K
    181
    transformers
    Image Text To Text

    optimum-intel-internal-testing/tiny-random-llava

    85K
    transformers
    Image Text To Text

    Open-Bee/Bee-8B-RL

    84K
    78
    transformers
    Image Text To Text

    unsloth/Qwen2.5-VL-7B-Instruct-GGUF

    84K
    158
    transformers
    Image Text To Text

    unsloth/gemma-3-27b-it-GGUF

    84K
    199
    transformers
    Image Text To Text

    QuantTrio/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-AWQ

    83K
    12
    transformers
    Image Text To Text

    fancyfeast/llama-joycaption-beta-one-hf-llava

    82K
    340
    transformers
    Image Text To Text

    unsloth/gemma-4-E2B-it-unsloth-bnb-4bit

    82K
    6
    Image Text To Text

    unsloth/Qwen3.5-397B-A17B-GGUF

    81K
    240
    transformers
    Image Text To Text

    LiquidAI/LFM2-VL-450M-GGUF

    80K
    42
    Image Text To Text

    adept/fuyu-8b

    79K
    1,019
    transformers
    Image Text To Text

    DavidAU/Qwen3.5-9B-Claude-4.6-OS-Auto-Variable-HERETIC-UNCENSORED-THINKING-MAX-NEOCODE-Imatrix-GGUF

    79K
    85
    transformers
    Image Text To Text

    ibm-granite/granite-docling-258M

    79K
    1,160
    transformers
    Image Text To Text

    microsoft/Phi-4-reasoning-vision-15B

    78K
    168
    Image Text To Text

    aifeifei798/Gemma-4-31B-Cognitive-Unshackled

    78K
    22
    transformers
    Image Text To Text

    microsoft/udop-large

    77K
    124
    transformers
    Image Text To Text

    bartowski/Qwen_Qwen3.5-9B-GGUF

    77K
    59
    Image Text To Text

    nvidia/NVLM-D-72B

    76K
    775
    transformers
    Image Text To Text

    Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF

    76K
    82
    gguf
    Image Text To Text

    llmfan46/gemma-4-26B-A4B-it-ultra-uncensored-heretic-GGUF

    76K
    51
    transformers
    Image Text To Text

    OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview-HF

    76K
    9
    transformers
    12 / 17