NEWAgents can now see video via MCP.Try it now →

    Image Text To Text Models

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    400 models available

    Showing 217240 of 400 models

    Image Text To Text

    lmstudio-community/Qwen3-VL-8B-Instruct-MLX-4bit

    114K
    4
    mlx
    Image Text To Text

    HuggingFaceM4/idefics2-8b

    114K
    621
    transformers
    Image Text To Text

    trl-internal-testing/tiny-LlavaNextForConditionalGeneration

    114K
    transformers
    Image Text To Text

    QuantTrio/Qwen3.6-35B-A3B-AWQ

    112K
    13
    transformers
    Image Text To Text

    facebook/chameleon-7b

    111K
    200
    transformers
    Image Text To Text

    Qwen/Qwen2-VL-2B-Instruct-AWQ

    111K
    24
    Image Text To Text

    HauhauCS/Qwen3.6-27B-Uncensored-HauhauCS-Aggressive

    110K
    148
    Image Text To Text

    mlx-community/gemma-3-12b-it-qat-4bit

    110K
    18
    transformers
    Image Text To Text

    lmstudio-community/gemma-4-26B-A4B-it-MLX-6bit

    109K
    transformers
    Image Text To Text

    moonshotai/Kimi-VL-A3B-Thinking

    109K
    447
    transformers
    Image Text To Text

    unsloth/gemma-4-E4B-it-unsloth-bnb-4bit

    108K
    15
    Image Text To Text

    rednote-hilab/dots.mocr

    108K
    109
    dots_mocr
    Image Text To Text

    lmstudio-community/Qwen3-VL-8B-Instruct-MLX-8bit

    108K
    4
    mlx
    Image Text To Text

    trl-internal-testing/tiny-Qwen2VLForConditionalGeneration

    107K
    transformers
    Image Text To Text

    lmstudio-community/Qwen3-VL-8B-Instruct-MLX-5bit

    106K
    mlx
    Image Text To Text

    unsloth/Qwen3.5-0.8B

    106K
    11
    transformers
    Image Text To Text

    lmstudio-community/Qwen3-VL-8B-Instruct-MLX-6bit

    106K
    mlx
    Image Text To Text

    mlx-community/gemma-4-26b-a4b-it-4bit

    106K
    44
    mlx
    Image Text To Text

    llmfan46/gemma-4-31B-it-uncensored-heretic-GGUF

    105K
    50
    transformers
    Image Text To Text

    CohereLabs/aya-vision-8b

    105K
    321
    transformers
    Image Text To Text

    Qwen/Qwen2.5-VL-72B-Instruct

    105K
    609
    transformers
    Image Text To Text

    cyankiwi/Qwen3.5-122B-A10B-AWQ-4bit

    104K
    31
    transformers
    Image Text To Text

    OpenGVLab/InternVL3_5-30B-A3B

    104K
    42
    transformers
    Image Text To Text

    meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8

    103K
    163
    transformers
    10 / 17