NEWAgents can now see video via MCP.Try it now →

    Image Text To Text Models

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    400 models available

    Showing 289312 of 400 models

    Image Text To Text

    Qwen/Qwen3-VL-4B-Instruct-FP8

    75K
    57
    transformers
    Image Text To Text

    OpenGVLab/InternVL3_5-1B

    75K
    27
    transformers
    Image Text To Text

    mlx-community/gemma-4-31b-it-4bit

    75K
    34
    mlx
    Image Text To Text

    Skywork/Skywork-R1V-38B

    74K
    128
    transformers
    Image Text To Text

    unsloth/Qwen3.5-35B-A3B

    74K
    14
    transformers
    Image Text To Text

    nvidia/Cosmos-Reason1-7B

    74K
    240
    transformers
    Image Text To Text

    Jackrong/Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF

    73K
    117
    Image Text To Text

    mlx-community/Qwen3.5-9B-MLX-4bit

    72K
    102
    mlx
    Image Text To Text

    allenai/Molmo2-O-7B

    71K
    21
    transformers
    Image Text To Text

    rhymes-ai/Aria

    71K
    637
    transformers
    Image Text To Text

    cyankiwi/Qwen3.6-27B-AWQ-INT4

    70K
    30
    transformers
    Image Text To Text

    QuantTrio/Qwen3.5-122B-A10B-AWQ

    69K
    26
    transformers
    Image Text To Text

    Qwen/Qwen3-VL-32B-Thinking

    69K
    87
    transformers
    Image Text To Text

    RedHatAI/gemma-4-31B-it-NVFP4

    68K
    31
    transformers
    Image Text To Text

    trl-internal-testing/tiny-Qwen3VLForConditionalGeneration

    67K
    transformers
    Image Text To Text

    AIDC-AI/Ovis2-4B

    67K
    62
    transformers
    Image Text To Text

    nvidia/Eagle2.5-8B

    66K
    38
    transformers
    Image Text To Text

    unsloth/Qwen2.5-VL-7B-Instruct-unsloth-bnb-4bit

    66K
    51
    transformers
    Image Text To Text

    lmstudio-community/Qwen3.5-397B-A17B-MLX-8bit

    66K
    1
    transformers
    Image Text To Text

    LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Wasserstein-GGUF

    66K
    63
    Image Text To Text

    openvla/openvla-7b-finetuned-libero-object

    65K
    1
    transformers
    Image Text To Text

    Salesforce/blip2-flan-t5-xl

    64K
    92
    transformers
    Image Text To Text

    OpenGVLab/InternVL2_5-2B

    64K
    33
    transformers
    Image Text To Text

    ciocan/gemma-4-E4B-it-W4A16

    63K
    2
    transformers
    13 / 17