NEWAgents can now see video via MCP.Try it now →

    Visual Question Answering Models

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    202 models available

    Showing 4972 of 202 models

    Visual Question Answering

    Dmjdxb/deplot

    152
    Visual Question Answering

    microsoft/git-large-vqav2

    141
    19
    transformers
    Visual Question Answering

    gaianet/MiniCPM-V-4_5-GGUF

    140
    4
    Visual Question Answering

    0xDing/yuren-baichuan-7b

    139
    27
    transformers
    Visual Question Answering

    introvoyz041/OpenMed-SynthVision-MedVL-AIO-GGUF

    136
    transformers
    Visual Question Answering

    jzsues/llava-qwen1.5-4b-chat

    122
    transformers
    Visual Question Answering

    Bingsu/temp_vilt_vqa

    117
    transformers
    Visual Question Answering

    google/matcha-chart2text-statista

    112
    10
    transformers
    Visual Question Answering

    JosephDefonse/Med3DVLM-PMCT

    112
    Visual Question Answering

    jihadzakki/blip1-medvqa

    111
    2
    transformers
    Visual Question Answering

    google/pix2struct-widget-captioning-base

    106
    6
    transformers
    Visual Question Answering

    BlackB/blip2-pokemon-pokemon

    106
    transformers
    Visual Question Answering

    mPLUG/mPLUG-Owl3-2B-241014

    104
    6
    Visual Question Answering

    OpenDataArena/MMFineReason-8B

    104
    10
    Visual Question Answering

    qihoo360/360VL-8B

    102
    13
    transformers
    Visual Question Answering

    prapaa/eastrus-vl-qwen3-8b

    102
    Visual Question Answering

    gaianet/MiniCPM-V-4-GGUF

    101
    Visual Question Answering

    OpenFace-CQUPT/Human_LLaVA

    99
    43
    transformers
    Visual Question Answering

    gaianet/MiniCPM-V-2_6-GGUF

    99
    Visual Question Answering

    CHELSEA234/llava-v1.5-7b-M2F2-Det

    98
    1
    Visual Question Answering

    Atul8827/vilt_finetuned_200

    96
    transformers
    Visual Question Answering

    google/pix2struct-ocrvqa-base

    95
    5
    transformers
    Visual Question Answering

    BhashaAI/ViLaH

    95
    1
    transformers
    Visual Question Answering

    TIGER-Lab/VL-Rethinker-72B

    94
    5
    transformers
    3 / 9