NEWVectors or files. Pick a path.Start →

    Image Text To Text Models

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    550 models available

    Showing 265288 of 550 models

    Image Text To Text

    mlx-community/Qwen3.6-35B-A3B-4bit

    121K
    62
    mlx
    Image Text To Text

    trl-internal-testing/tiny-LlavaForConditionalGeneration

    121K
    transformers
    Image Text To Text

    lkhl/VideoLLaMA3-2B-Image-HF

    121K
    transformers
    Image Text To Text

    fancyfeast/llama-joycaption-beta-one-hf-llava

    120K
    358
    transformers
    Image Text To Text

    unsloth/Qwen3.5-122B-A10B-GGUF

    120K
    271
    transformers
    Image Text To Text

    Skywork/Skywork-R1V-38B

    119K
    128
    transformers
    Image Text To Text

    lmstudio-community/GLM-4.6V-Flash-MLX-4bit

    119K
    2
    transformers
    Image Text To Text

    google/gemma-4-26B-A4B

    119K
    297
    transformers
    Image Text To Text

    microsoft/kosmos-2.5

    119K
    271
    transformers
    Image Text To Text

    CohereLabs/command-a-vision-07-2025

    118K
    89
    transformers
    Image Text To Text

    lmstudio-community/Qwen3.6-35B-A3B-MLX-4bit

    118K
    transformers
    Image Text To Text

    rhymes-ai/Aria

    117K
    638
    transformers
    Image Text To Text

    lmstudio-community/GLM-4.6V-Flash-MLX-8bit

    117K
    1
    transformers
    Image Text To Text

    LiquidAI/LFM2-VL-450M

    117K
    148
    transformers
    Image Text To Text

    mlx-community/gemma-3-4b-it-qat-4bit

    116K
    9
    transformers
    Image Text To Text

    lmstudio-community/GLM-4.6V-Flash-MLX-6bit

    116K
    transformers
    Image Text To Text

    nvidia/LocateAnything-3B

    116K
    1,541
    transformers
    Image Text To Text

    xlangai/OpenCUA-7B

    115K
    29
    transformers
    Image Text To Text

    AIDC-AI/Ovis2-1B

    114K
    97
    transformers
    Image Text To Text

    microsoft/Florence-2-base-ft

    114K
    141
    transformers
    Image Text To Text

    huihui-ai/Qwen2.5-VL-7B-Instruct-abliterated

    114K
    50
    transformers
    Image Text To Text

    huihui-ai/Huihui-Qwen3.5-4B-Claude-4.6-Opus-abliterated

    114K
    76
    transformers
    Image Text To Text

    rdtand/Qwen3.6-35B-A3B-PrismaQuant-4.75bit-vllm

    114K
    29
    vllm
    Image Text To Text

    TIGER-Lab/Mantis-8B-siglip-llama3

    113K
    33
    transformers
    12 / 23