nemotron-ocr-v2
by nvidia
28x faster multilingual OCR — production-grade throughput for RAG pipelines
nvidia/nemotron-ocr-v2mixpeek://image_extractor@v1/nvidia_nemotron_ocr_v2Overview
Nemotron OCR v2 is NVIDIA's high-throughput OCR model designed for production RAG pipelines. At 34.7 pages per second on an A100, it processes documents 28x faster than PaddleOCR while supporting English, Chinese, Japanese, Korean, and Russian in a single architecture — no language detection step required.
The model uses a RegNetX backbone for visual feature extraction paired with a Transformer decoder for text generation. On Mixpeek, it powers bulk document ingestion where throughput is the bottleneck — scanning millions of pages into searchable text at speeds that keep up with real-time document feeds.
Architecture
RegNetX visual backbone with Transformer text decoder. Unified architecture handles 5 languages (en, zh, ja, ko, ru) without language detection. Optimized for batch inference with TensorRT acceleration.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "my-collection",source: { url: "https://example.com/multilingual-report.pdf" },feature_extractors: [{name: "ocr",version: "v1",params: {model_id: "nvidia/nemotron-ocr-v2"}}]});
Capabilities
- 34.7 pages/sec throughput (28x faster than PaddleOCR)
- 5-language support without language detection
- Production-optimized for batch inference
- TensorRT acceleration support
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| Internal multi-language benchmark | Throughput | 34.7 pages/sec | NVIDIA, 2026 — Model Card |
Performance
Specification
Research Paper
Nemotron OCR v2
arxiv.orgBuild a pipeline with nemotron-ocr-v2
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio