NEWAgents can now see video via MCP.Try it now →
    Models/Visual Question Answering/DAMO-NLP-SG/VideoLLaMA3-2B-Image
    Visual Question Answeringtransformersapache-2.0

    VideoLLaMA3-2B-Image

    by DAMO-NLP-SG

    Identifier
    Model ID
    DAMO-NLP-SG/VideoLLaMA3-2B-Image

    Tags

    transformerssafetensorsvideollama3_qwen2text-generationmulti-modallarge-language-modelvideo-language-modelvisual-question-answeringcustom_codeendataset:lmms-lab/LLaVA-OneVision-Datadataset:allenai/pixmo-docsdataset:HuggingFaceM4/Docmatixdataset:lmms-lab/LLaVA-Video-178Kdataset:ShareGPT4Video/ShareGPT4Videoarxiv:2501.13106arxiv:2406.07476arxiv:2306.02858base_model:Qwen/Qwen2.5-1.5B-Instructbase_model:finetune:Qwen/Qwen2.5-1.5B-Instructlicense:apache-2.0region:us

    Use VideoLLaMA3-2B-Image on Mixpeek

    Build multimodal processing pipelines with this model and others. Extract features, run inference, and set up retrieval, all through the Mixpeek pipeline builder.

    Open Pipeline Builder