NEWWhy single embeddings fail for video.Read the post →

    Image Text To Text Models

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    485 models available

    Showing 124 of 485 models

    Image Text To Text

    Qwen/Qwen3-VL-2B-Instruct

    149.0M
    405
    transformers
    Image Text To Text

    google/gemma-4-31B-it

    9.9M
    2,672
    transformers
    Image Text To Text

    google/gemma-4-26B-A4B-it

    8.4M
    961
    transformers
    Image Text To Text

    Qwen/Qwen3.5-9B

    8.1M
    1,447
    transformers
    Image Text To Text

    Qwen/Qwen3.5-4B

    7.6M
    543
    transformers
    Image Text To Text

    Qwen/Qwen2.5-VL-7B-Instruct

    6.9M
    1,536
    transformers
    Image Text To Text

    Qwen/Qwen3-VL-8B-Instruct

    6.1M
    907
    transformers
    Image Text To Text

    Qwen/Qwen3.6-27B-FP8

    5.6M
    212
    transformers
    Image Text To Text

    Qwen/Qwen3.6-35B-A3B

    5.5M
    1,803
    transformers
    Image Text To Text

    Qwen/Qwen3.6-35B-A3B-FP8

    4.8M
    216
    transformers
    Image Text To Text

    Qwen/Qwen2.5-VL-3B-Instruct

    4.2M
    646
    transformers
    Image Text To Text

    Qwen/Qwen2-VL-2B-Instruct

    4.1M
    501
    transformers
    Image Text To Text

    cyankiwi/gemma-4-26B-A4B-it-AWQ-4bit

    3.6M
    68
    transformers
    Image Text To Text

    Qwen/Qwen3.6-27B

    3.4M
    1,315
    transformers
    Image Text To Text

    Qwen/Qwen3.5-27B

    3.3M
    972
    transformers
    Image Text To Text

    unsloth/gemma-4-26B-A4B-it-GGUF

    3.3M
    742
    Image Text To Text

    llava-hf/llava-1.5-7b-hf

    3.2M
    361
    transformers
    Image Text To Text

    Qwen/Qwen3.5-35B-A3B

    3.2M
    1,427
    transformers
    Image Text To Text

    Qwen/Qwen2-VL-7B-Instruct

    3.2M
    1,275
    transformers
    Image Text To Text

    deepseek-ai/DeepSeek-OCR

    3.0M
    3,235
    transformers
    Image Text To Text

    Qwen/Qwen3-VL-4B-Instruct

    2.9M
    388
    transformers
    Image Text To Text

    vikhyatk/moondream2

    2.8M
    1,413
    transformers
    Image Text To Text

    Qwen/Qwen3.5-0.8B

    2.8M
    536
    transformers
    Image Text To Text

    google/gemma-3-12b-it

    2.8M
    718
    transformers
    1 / 21