Unlimited-OCR
by baidu
One-shot long-horizon OCR for multi-page documents and PDFs
baidu/Unlimited-OCRmixpeek://image_extractor@v1/baidu_unlimited_ocr_3b_v1Overview
Unlimited-OCR is Baidu's 3B vision-language OCR model built for one-shot parsing of long documents. Rather than reading a page at a time, it ingests multi-page images and PDFs in a single pass with a 32K-token context, preserving layout, reading order, tables, and formatting instead of returning a flat bag of words. It extends the DeepSeek-OCR line of compressed-token OCR with multi-page parsing and n-gram repetition guards, and runs efficiently under vLLM or SGLang.
On Mixpeek, Unlimited-OCR powers the OCR extractor when the goal is faithful structured text from whole documents — contracts, reports, scanned decks — so an agent can search the recovered text, tables, and headings, not just raw pixels. Layout-preserving output makes downstream chunking and section-aware retrieval far cleaner than character-level OCR.
Architecture
3B-parameter vision-language transformer (BF16) with a compressed-vision-token design that maps document images to a small token budget, then decodes structured text with a 32,768-token context. Two input modes trade resolution for cost (gundam 640px, base 1024px); custom n-gram repetition avoidance stabilizes long-form decoding. Served via Transformers, vLLM, and SGLang.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";
const mx = new Mixpeek({ apiKey: "API_KEY" });
// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
namespace_id: "my-namespace",
collection_name: "my-collection",
source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
feature_extractor: {
feature_extractor_name: "ocr",
version: "v1",
parameters: { model_id: "baidu/Unlimited-OCR" },
},
});Capabilities
- One-shot multi-page document and PDF parsing
- Layout, reading-order, table, and formula preservation
- 32K-token long-context decoding
- Multilingual OCR under an MIT license
Use Cases on Mixpeek
Common Pipeline Companions
Explore on Mixpeek
Compare alternatives in this category
Hand-picked tools & platforms compared
Deep-dive technical guide
See how Mixpeek runs models as extractors
Store & search embeddings at scale
Usage-based pricing for pipelines
Compare models, APIs & infrastructure
Specification
Research Paper
Unlimited-OCR (Baidu)
arxiv.orgBuild a pipeline with Unlimited-OCR
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio