DeepSeek-OCR-2

by deepseek-ai

3B OCR model with semantic visual reasoning for complex document understanding

3.1Mdl/month

1,052likes

3.4Bparams

HuggingFace Run on your data, free

Identifiers

Model ID

deepseek-ai/DeepSeek-OCR-2

Feature URI

mixpeek://image_extractor@v1/deepseek_ocr2_3b_v1

Overview

DeepSeek-OCR-2 is a 3B-parameter vision-language model that reimagines OCR through semantic reasoning rather than traditional top-to-bottom scanning. Its DeepEncoder V2 uses a Causal Visual Flow architecture that dynamically reorders image segments based on semantic understanding, compressing high-resolution documents into just 256-1,120 visual tokens while maintaining near-lossless text and layout fidelity.

On Mixpeek, DeepSeek-OCR-2 is the state-of-the-art choice for document parsing, outperforming larger models on complex layouts, tables, and mixed text-structure documents across 100+ languages. It excels where traditional OCR models struggle: multi-column layouts, nested tables, and documents with interspersed diagrams.

Architecture

DeepEncoder V2 with Causal Visual Flow architecture replacing rigid top-to-bottom scanning with semantics-aware segment reordering. Vision tokenizer follows SAM design with 80M parameters plus a convolutional layer. 3B-parameter mixture-of-experts decoder for text, layout, and diagram understanding.

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
  namespace_id: "my-namespace",
  collection_name: "my-collection",
  source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
  feature_extractor: {
    feature_extractor_name: "ocr",
    version: "v1",
    parameters: { model_id: "deepseek-ai/DeepSeek-OCR-2" },
  },
});

Capabilities

91.09% on OmniDocBench v1.5 benchmark
Semantic visual reasoning instead of spatial scanning
256-1,120 visual tokens per page (highly efficient)
100+ language support for multilingual documents
Strong on complex layouts: tables, formulas, nested structures

Use Cases on Mixpeek

Enterprise document parsing for contracts, invoices, and financial reports with complex layouts

Scientific paper analysis with formulas, tables, and diagrams

Multilingual document digitization across global content archives

Benchmarks

Dataset	Metric	Score	Source
OmniDocBench v1.5	Overall Score	91.09%	DeepSeek-OCR-2 release, Jan 2026
OmniDocBench v1.5 (formula)	Recognition Score	90.31%	DeepSeek-OCR-2 release, Jan 2026
Reading Order	Edit Distance	0.057	DeepSeek-OCR-2 release, Jan 2026