omni-embed-nemotron-3b
by nvidia
Unified embedding model for text, image, audio, and video retrieval in a single vector space
nvidia/omni-embed-nemotron-3bmixpeek://image_extractor@v1/nvidia_omni_embed_nemotron_3b_v1Overview
Omni-Embed Nemotron is NVIDIA's omnimodal embedding model that encodes text, images, audio, and video into a shared 2048-dimensional vector space. Built on the Thinker component of Qwen2.5-Omni-3B, it processes each modality independently and projects into a single retrieval-ready embedding.
On Mixpeek, Omni-Embed Nemotron enables true cross-modal search — query with text and retrieve matching video clips, audio segments, document pages, or images from a single index. One model replaces four separate embedding pipelines.
Architecture
Transformer-based encoder derived from Qwen2.5-Omni-3B (Thinker only, no Talker). 2048-dim output embeddings. 32K max context tokens. Modality-separated encoding with independent audio and video processing paths. 4.7B parameters.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "media-library",source: { url: "https://example.com/video.mp4" },feature_extractors: [{feature: "multimodal_embedding",model: "nvidia/omni-embed-nemotron-3b"}]});
Capabilities
- Unified text, image, audio, and video embeddings in one model
- 2048-dimensional dense vectors for cross-modal retrieval
- 32K token context window
- State-of-the-art video retrieval among embedding models
- Competitive visual document retrieval (85.7 nDCG@5 on ViDoRe V1)
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| ViDoRe V1 (visual doc) | nDCG@5 | 85.7% | NVIDIA, 2025 — Model Card |
| MTEB text retrieval (10 tasks) | nDCG@10 avg | 0.606 | NVIDIA, 2025 — Model Card |
| Video retrieval (LPM + FineVideo) | nDCG@10 avg | 0.706 | NVIDIA, 2025 — Model Card |
Performance
Specification
Research Paper
Omni-Embed-Nemotron: A Unified Multimodal Retrieval Model
arxiv.orgBuild a pipeline with omni-embed-nemotron-3b
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio