NEWAgents can now see video via MCP.Try it now →

    Visual Question Answering Models

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    202 models available

    Showing 7396 of 202 models

    Visual Question Answering

    google/pix2struct-screen2words-base

    91
    25
    transformers
    Visual Question Answering

    mPLUG/mPLUG-Owl3-1B-241014

    87
    2
    Visual Question Answering

    Cylingo/Xinyuan-VL-2B

    86
    7
    transformers
    Visual Question Answering

    google/pix2struct-widget-captioning-large

    83
    20
    transformers
    Visual Question Answering

    openbmb/OmniLMM-12B

    82
    72
    transformers
    Visual Question Answering

    HPAI-BSC/Aloe-Vision-7B-AR

    82
    1
    Visual Question Answering

    nectec/Pathumma-llm-vision-1.0.0

    78
    11
    Visual Question Answering

    microsoft/git-large-textvqa

    76
    6
    transformers
    Visual Question Answering

    internlm/internlm-xcomposer2-vl-1_8b

    75
    18
    transformers
    Visual Question Answering

    garlandchou/V-Reflection

    75
    5
    Visual Question Answering

    lhzzzzzy/HiSpatial-3B

    75
    Visual Question Answering

    HPAI-BSC/Aloe-Vision-72B-AR

    75
    Visual Question Answering

    AXERA-TECH/InternVL3-2B

    74
    2
    Visual Question Answering

    prapaa/eastrus-vl-qwen3-2b-gguf

    74
    llama.cpp
    Visual Question Answering

    erax-ai/EraX-VL-7B-V1.5

    73
    9
    transformers
    Visual Question Answering

    byh711/FLODA-deepfake

    72
    peft
    Visual Question Answering

    AXERA-TECH/SmolVLM2-500M-Video-Instruct-python

    69
    2
    Visual Question Answering

    AXERA-TECH/Janus-Pro-1B

    66
    2
    Visual Question Answering

    Duckq/blip2-opt-2.7b-emotion-llm

    64
    transformers
    Visual Question Answering

    mradermacher/MemOCR-7B-i1-GGUF

    64
    1
    transformers
    Visual Question Answering

    RhapsodyAI/minicpm-guidance

    63
    7
    transformers
    Visual Question Answering

    RhapsodyAI/qwen_vl_guidance

    61
    4
    transformers
    Visual Question Answering

    mradermacher/NayanaVQA-GGUF

    60
    transformers
    Visual Question Answering

    OpenMed/Qwen2.5-3B-MedVL

    60
    1
    4 / 9