DeepSeek-OCR-2
by deepseek-ai
3B OCR model with semantic visual reasoning for complex document understanding
deepseek-ai/DeepSeek-OCR-2mixpeek://image_extractor@v1/deepseek_ocr2_3b_v1Overview
DeepSeek-OCR-2 is a 3B-parameter vision-language model that reimagines OCR through semantic reasoning rather than traditional top-to-bottom scanning. Its DeepEncoder V2 uses a Causal Visual Flow architecture that dynamically reorders image segments based on semantic understanding, compressing high-resolution documents into just 256-1,120 visual tokens while maintaining near-lossless text and layout fidelity.
On Mixpeek, DeepSeek-OCR-2 is the state-of-the-art choice for document parsing, outperforming larger models on complex layouts, tables, and mixed text-structure documents across 100+ languages. It excels where traditional OCR models struggle: multi-column layouts, nested tables, and documents with interspersed diagrams.
Architecture
DeepEncoder V2 with Causal Visual Flow architecture replacing rigid top-to-bottom scanning with semantics-aware segment reordering. Vision tokenizer follows SAM design with 80M parameters plus a convolutional layer. 3B-parameter mixture-of-experts decoder for text, layout, and diagram understanding.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";
const mx = new Mixpeek({ apiKey: "API_KEY" });
// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
namespace_id: "my-namespace",
collection_name: "my-collection",
source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
feature_extractor: {
feature_extractor_name: "ocr",
version: "v1",
parameters: { model_id: "deepseek-ai/DeepSeek-OCR-2" },
},
});Capabilities
- 91.09% on OmniDocBench v1.5 benchmark
- Semantic visual reasoning instead of spatial scanning
- 256-1,120 visual tokens per page (highly efficient)
- 100+ language support for multilingual documents
- Strong on complex layouts: tables, formulas, nested structures
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| OmniDocBench v1.5 | Overall Score | 91.09% | DeepSeek-OCR-2 release, Jan 2026 |
| OmniDocBench v1.5 (formula) | Recognition Score | 90.31% | DeepSeek-OCR-2 release, Jan 2026 |
| Reading Order | Edit Distance | 0.057 | DeepSeek-OCR-2 release, Jan 2026 |
Performance
3B params with MoE decoder — highly efficient visual token compression
Common Pipeline Companions
Explore on Mixpeek
Compare alternatives in this category
Hand-picked tools & platforms compared
Deep-dive technical guide
See how Mixpeek runs models as extractors
Store & search embeddings at scale
Usage-based pricing for pipelines
Compare models, APIs & infrastructure
Specification
Research Paper
DeepSeek-OCR-2 model card
arxiv.orgBuild a pipeline with DeepSeek-OCR-2
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio