NEWManaged multimodal retrieval.Explore platform →
    Models/Text Extraction/nvidia/nemotron-ocr-v2
    PyTorchOCRNVIDIA Open Model License

    nemotron-ocr-v2

    by nvidia

    28x faster multilingual OCR — production-grade throughput for RAG pipelines

    2.9Kdl/month
    N/Aparams
    Identifiers
    Model ID
    nvidia/nemotron-ocr-v2
    Feature URI
    mixpeek://image_extractor@v1/nvidia_nemotron_ocr_v2

    Overview

    Nemotron OCR v2 is NVIDIA's high-throughput OCR model designed for production RAG pipelines. At 34.7 pages per second on an A100, it processes documents 28x faster than PaddleOCR while supporting English, Chinese, Japanese, Korean, and Russian in a single architecture — no language detection step required.

    The model uses a RegNetX backbone for visual feature extraction paired with a Transformer decoder for text generation. On Mixpeek, it powers bulk document ingestion where throughput is the bottleneck — scanning millions of pages into searchable text at speeds that keep up with real-time document feeds.

    Architecture

    RegNetX visual backbone with Transformer text decoder. Unified architecture handles 5 languages (en, zh, ja, ko, ru) without language detection. Optimized for batch inference with TensorRT acceleration.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    await mx.collections.ingest({
    collection_id: "my-collection",
    source: { url: "https://example.com/multilingual-report.pdf" },
    feature_extractors: [{
    name: "ocr",
    version: "v1",
    params: {
    model_id: "nvidia/nemotron-ocr-v2"
    }
    }]
    });

    Capabilities

    • 34.7 pages/sec throughput (28x faster than PaddleOCR)
    • 5-language support without language detection
    • Production-optimized for batch inference
    • TensorRT acceleration support

    Use Cases on Mixpeek

    High-volume document ingestion for enterprise search
    Real-time OCR on streaming document feeds
    Multilingual document processing without language routing
    Cost-efficient batch processing of large document archives

    Benchmarks

    DatasetMetricScoreSource
    Internal multi-language benchmarkThroughput34.7 pages/secNVIDIA, 2026 — Model Card

    Performance

    Input SizeVariable resolution document pages
    GPU Latency~29ms / page (A100, batch 32)
    GPU Throughput~34.7 pages/sec (A100)
    GPU Memory~4 GB

    Specification

    FrameworkPyTorch
    Organizationnvidia
    FeatureOCR
    Outputtext + bbox
    Modalitiesvideo, image, document
    RetrieverText-in-Image
    ParametersN/A
    LicenseNVIDIA Open Model License
    Downloads/mo2.9K

    Research Paper

    Nemotron OCR v2

    arxiv.org

    Build a pipeline with nemotron-ocr-v2

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio