NEWWhy single embeddings fail for video.Read the post →
    Models/Text Extraction/deepseek-ai/DeepSeek-OCR-2
    HFOCRApache 2.0

    DeepSeek-OCR-2

    by deepseek-ai

    3B OCR model with semantic visual reasoning for complex document understanding

    1.6Mdl/month
    3Bparams
    Identifiers
    Model ID
    deepseek-ai/DeepSeek-OCR-2
    Feature URI
    mixpeek://image_extractor@v1/deepseek_ocr2_3b_v1

    Overview

    DeepSeek-OCR-2 is a 3B-parameter vision-language model that reimagines OCR through semantic reasoning rather than traditional top-to-bottom scanning. Its DeepEncoder V2 uses a Causal Visual Flow architecture that dynamically reorders image segments based on semantic understanding, compressing high-resolution documents into just 256-1,120 visual tokens while maintaining near-lossless text and layout fidelity.

    On Mixpeek, DeepSeek-OCR-2 is the state-of-the-art choice for document parsing, outperforming larger models on complex layouts, tables, and mixed text-structure documents across 100+ languages. It excels where traditional OCR models struggle: multi-column layouts, nested tables, and documents with interspersed diagrams.

    Architecture

    DeepEncoder V2 with Causal Visual Flow architecture replacing rigid top-to-bottom scanning with semantics-aware segment reordering. Vision tokenizer follows SAM design with 80M parameters plus a convolutional layer. 3B-parameter mixture-of-experts decoder for text, layout, and diagram understanding.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    await mx.collections.ingest({
    collection_id: "my-collection",
    source: { url: "https://example.com/contract.pdf" },
    feature_extractors: [{
    name: "ocr",
    version: "v1",
    params: {
    model_id: "deepseek-ai/DeepSeek-OCR-2"
    }
    }]
    });

    Capabilities

    • 91.09% on OmniDocBench v1.5 benchmark
    • Semantic visual reasoning instead of spatial scanning
    • 256-1,120 visual tokens per page (highly efficient)
    • 100+ language support for multilingual documents
    • Strong on complex layouts: tables, formulas, nested structures

    Use Cases on Mixpeek

    Enterprise document parsing for contracts, invoices, and financial reports with complex layouts
    Scientific paper analysis with formulas, tables, and diagrams
    Multilingual document digitization across global content archives

    Benchmarks

    DatasetMetricScoreSource
    OmniDocBench v1.5Overall Score91.09%DeepSeek-OCR-2 release, Jan 2026
    OmniDocBench v1.5 (formula)Recognition Score90.31%DeepSeek-OCR-2 release, Jan 2026
    Reading OrderEdit Distance0.057DeepSeek-OCR-2 release, Jan 2026

    Performance

    Input SizeVariable resolution (256-1120 visual tokens per page)
    GPU Latency~30ms / page (A100)
    GPU Throughput~33 pages/sec (A100)
    GPU Memory~6.5 GB

    3B params with MoE decoder — highly efficient visual token compression

    Specification

    FrameworkHF
    Organizationdeepseek-ai
    FeatureOCR
    Outputtext + bbox
    Modalitiesvideo, image, document
    RetrieverText-in-Image
    Parameters3B
    LicenseApache 2.0
    Downloads/mo1.6M

    Research Paper

    DeepSeek-OCR-2 model card

    arxiv.org

    Build a pipeline with DeepSeek-OCR-2

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio