Unlimited-OCR

by baidu

One-shot long-horizon OCR for multi-page documents and PDFs

2.1Mdl/month

2,257likes

3.3Bparams

HuggingFace Run on your data, free

Identifiers

Model ID

baidu/Unlimited-OCR

Feature URI

mixpeek://image_extractor@v1/baidu_unlimited_ocr_3b_v1

Overview

Unlimited-OCR is Baidu's 3B vision-language OCR model built for one-shot parsing of long documents. Rather than reading a page at a time, it ingests multi-page images and PDFs in a single pass with a 32K-token context, preserving layout, reading order, tables, and formatting instead of returning a flat bag of words. It extends the DeepSeek-OCR line of compressed-token OCR with multi-page parsing and n-gram repetition guards, and runs efficiently under vLLM or SGLang.

On Mixpeek, Unlimited-OCR powers the OCR extractor when the goal is faithful structured text from whole documents — contracts, reports, scanned decks — so an agent can search the recovered text, tables, and headings, not just raw pixels. Layout-preserving output makes downstream chunking and section-aware retrieval far cleaner than character-level OCR.

Architecture

3B-parameter vision-language transformer (BF16) with a compressed-vision-token design that maps document images to a small token budget, then decodes structured text with a 32,768-token context. Two input modes trade resolution for cost (gundam 640px, base 1024px); custom n-gram repetition avoidance stabilizes long-form decoding. Served via Transformers, vLLM, and SGLang.

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
  namespace_id: "my-namespace",
  collection_name: "my-collection",
  source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
  feature_extractor: {
    feature_extractor_name: "ocr",
    version: "v1",
    parameters: { model_id: "baidu/Unlimited-OCR" },
  },
});