nemotron-colembed-vl-8b-v2

by nvidia

State-of-the-art late-interaction visual document retrieval

12Kdl/month

48likes

8.8Bparams

HuggingFace Run on your data

Identifiers

Model ID

nvidia/nemotron-colembed-vl-8b-v2

Feature URI

mixpeek://image_extractor@v1/nvidia_nemotron_colembed_vl_8b_v2

Overview

Nemotron ColEmbed VL is an 8B-parameter ColBERT-style multi-vector embedding model built on Qwen3-VL-8B-Instruct. It produces per-token embeddings for both queries and documents, enabling fine-grained matching between query terms and document regions. This late-interaction approach is particularly powerful for visual document retrieval, where different parts of a document page (headers, tables, figures) need to match different parts of a query.

The model ranks #1 on ViDoRe V3, the visual document retrieval benchmark, with a score of 63.54 -- surpassing ColPali and ColQwen variants.

Architecture

ColBERT-style architecture on top of Qwen3-VL-8B-Instruct. Produces multi-vector representations (one vector per token) rather than single-vector embeddings. Matching uses MaxSim: for each query token, find the maximum similarity to any document token, then sum across query tokens.

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
  namespace_id: "my-namespace",
  collection_name: "my-collection",
  source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
  feature_extractor: {
    feature_extractor_name: "visual_embeddings",
    version: "v1",
    parameters: { model_id: "nvidia/nemotron-colembed-vl-8b-v2" },
  },
});