NEWManaged multimodal retrieval.Explore platform →
    Models/Embeddings/nvidia/nemotron-colembed-vl-8b-v2
    HFVisual EmbeddingsApache-2.0

    nemotron-colembed-vl-8b-v2

    by nvidia

    State-of-the-art late-interaction visual document retrieval

    20.7Kdl/month
    8Bparams
    Identifiers
    Model ID
    nvidia/nemotron-colembed-vl-8b-v2
    Feature URI
    mixpeek://image_extractor@v1/nvidia_nemotron_colembed_vl_8b_v2

    Overview

    Nemotron ColEmbed VL is an 8B-parameter ColBERT-style multi-vector embedding model built on Qwen3-VL-8B-Instruct. It produces per-token embeddings for both queries and documents, enabling fine-grained matching between query terms and document regions. This late-interaction approach is particularly powerful for visual document retrieval, where different parts of a document page (headers, tables, figures) need to match different parts of a query.

    The model ranks #1 on ViDoRe V3, the visual document retrieval benchmark, with a score of 63.54 -- surpassing ColPali and ColQwen variants.

    Architecture

    ColBERT-style architecture on top of Qwen3-VL-8B-Instruct. Produces multi-vector representations (one vector per token) rather than single-vector embeddings. Matching uses MaxSim: for each query token, find the maximum similarity to any document token, then sum across query tokens.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    await mx.collections.ingest({
    collection_id: "doc-collection",
    source: { url: "https://example.com/report.pdf" },
    feature_extractors: [{
    feature: "visual_embeddings",
    model: "nvidia/nemotron-colembed-vl-8b-v2"
    }]
    });

    Capabilities

    • Multi-vector (ColBERT-style) embeddings for fine-grained matching
    • #1 on ViDoRe V3 visual document retrieval benchmark
    • Handles mixed-content documents: text, tables, charts, figures
    • Supports both text queries and image queries
    • Per-token matching enables localization of relevant document regions

    Use Cases on Mixpeek

    Visual document search: find specific pages in PDF libraries using natural language
    Invoice and form extraction: locate specific fields across document layouts
    Technical documentation retrieval: match queries to diagrams, code blocks, and text simultaneously
    Legal document discovery: find relevant clauses across diverse document formats

    Benchmarks

    DatasetMetricScoreSource
    ViDoRe V3NDCG@563.54https://huggingface.co/nvidia/nemotron-colembed-vl-8b-v2

    Specification

    FrameworkHF
    Organizationnvidia
    FeatureVisual Embeddings
    Output768-dim vector
    Modalitiesvideo, image
    RetrieverVector Search
    Parameters8B
    LicenseApache-2.0
    Downloads/mo20.7K

    Research Paper

    Nemotron ColEmbed VL

    arxiv.org

    Build a pipeline with nemotron-colembed-vl-8b-v2

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio