Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16
by nvidia
Omnimodal VLM that processes text, images, video, and audio with only 3B active parameters
nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16mixpeek://image_extractor@v1/nvidia_nemotron3_nano_omni_v1Overview
Nemotron-3-Nano-Omni is NVIDIA's Mixture-of-Experts model that unifies vision, audio, and language understanding in a single architecture. With 31B total parameters but only 3B active per token, it delivers omnimodal perception at a fraction of the compute cost of dense models — up to 9x throughput over comparable open alternatives.
On Mixpeek, Nemotron-3-Nano-Omni serves as a universal perception backbone: a single model call extracts understanding from video (up to 2 minutes), audio (up to 1 hour), images, and text. This eliminates the need for separate caption, transcription, and analysis models in complex pipelines.
Architecture
Mamba2-Transformer hybrid MoE. C-RADIOv4-H vision encoder + Parakeet-TDT-0.6B audio encoder + MoE language decoder. 31B total / ~3B active params per token. 256K context window. Processes up to 2 minutes of video or 1 hour of audio.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "media-library",source: { url: "https://example.com/meeting-recording.mp4" },feature_extractors: [{feature: "scene_caption",model: "nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16"}]});
Capabilities
- Unified vision + audio + text understanding in one model
- Only 3B active parameters per token (MoE efficiency)
- 256K context window for long audio and document processing
- Strong OCR and document understanding (67.04 on OCRBenchV2)
- Video + audio QA (74.52 on DailyOmni)
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| Video MME | Accuracy | 72.2% | NVIDIA, 2026 — arxiv:2604.24954 |
| DailyOmni (video+audio QA) | Accuracy | 74.52% | NVIDIA, 2026 — arxiv:2604.24954 |
| OCRBenchV2 (EN) | Accuracy | 67.04 | NVIDIA, 2026 — arxiv:2604.24954 |
Performance
Specification
Research Paper
Nemotron-3-Nano-Omni Technical Report
arxiv.orgBuild a pipeline with Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio