NEWAgents can now see video via MCP.Try it now →

    Image Text To Text Models

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    400 models available

    Showing 385400 of 400 models

    Image Text To Text

    meta-llama/Llama-4-Maverick-17B-128E-Instruct

    40K
    479
    transformers
    Image Text To Text

    mlx-community/Qwen3.5-35B-A3B-4bit

    40K
    36
    transformers
    Image Text To Text

    mlx-community/gemma-4-31b-8bit

    39K
    20
    mlx
    Image Text To Text

    OpenGVLab/InternVL2_5-4B-AWQ

    39K
    7
    transformers
    Image Text To Text

    Qwen/Qwen3-VL-8B-Instruct-GGUF

    39K
    84
    transformers
    Image Text To Text

    OpenGVLab/InternVL3-78B

    39K
    234
    transformers
    Image Text To Text

    AIDC-AI/Ovis2.6-30B-A3B

    38K
    143
    Image Text To Text

    llava-hf/llama3-llava-next-8b-hf

    38K
    51
    transformers
    Image Text To Text

    docling-project/SmolDocling-256M-preview

    38K
    1,614
    transformers
    Image Text To Text

    OpenGVLab/InternVL3-8B-hf

    38K
    9
    transformers
    Image Text To Text

    apolo13x/Qwen3.5-35B-A3B-NVFP4

    38K
    15
    transformers
    Image Text To Text

    RedHatAI/Qwen2.5-VL-3B-Instruct-quantized.w8a8

    37K
    2
    transformers
    Image Text To Text

    optimum-intel-internal-testing/tiny-random-llava-next-mistral

    37K
    transformers
    Image Text To Text

    Jackrong/Qwopus3.5-4B-v3-GGUF

    37K
    41
    Image Text To Text

    unsloth/Qwen3-VL-8B-Instruct-unsloth-bnb-4bit

    37K
    19
    Image Text To Text

    cyankiwi/gemma-4-31B-it-AWQ-8bit

    36K
    14
    transformers
    17 / 17