NEWVectors or files. Pick a path.Start →
    Models/Text Extraction/baidu/Unlimited-OCR
    HFOCRMIT

    Unlimited-OCR

    by baidu

    One-shot long-horizon OCR for multi-page documents and PDFs

    758Kdl/month
    3Bparams
    Identifiers
    Model ID
    baidu/Unlimited-OCR
    Feature URI
    mixpeek://image_extractor@v1/baidu_unlimited_ocr_3b_v1

    Overview

    Unlimited-OCR is Baidu's 3B vision-language OCR model built for one-shot parsing of long documents. Rather than reading a page at a time, it ingests multi-page images and PDFs in a single pass with a 32K-token context, preserving layout, reading order, tables, and formatting instead of returning a flat bag of words. It extends the DeepSeek-OCR line of compressed-token OCR with multi-page parsing and n-gram repetition guards, and runs efficiently under vLLM or SGLang.

    On Mixpeek, Unlimited-OCR powers the OCR extractor when the goal is faithful structured text from whole documents — contracts, reports, scanned decks — so an agent can search the recovered text, tables, and headings, not just raw pixels. Layout-preserving output makes downstream chunking and section-aware retrieval far cleaner than character-level OCR.

    Architecture

    3B-parameter vision-language transformer (BF16) with a compressed-vision-token design that maps document images to a small token budget, then decodes structured text with a 32,768-token context. Two input modes trade resolution for cost (gundam 640px, base 1024px); custom n-gram repetition avoidance stabilizes long-form decoding. Served via Transformers, vLLM, and SGLang.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    
    // Managed: create a collection over a bucket; Mixpeek runs this model's extractor
    const collection = await mx.collections.create({
      namespace_id: "my-namespace",
      collection_name: "my-collection",
      source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
      feature_extractor: {
        feature_extractor_name: "ocr",
        version: "v1",
        parameters: { model_id: "baidu/Unlimited-OCR" },
      },
    });

    Capabilities

    • One-shot multi-page document and PDF parsing
    • Layout, reading-order, table, and formula preservation
    • 32K-token long-context decoding
    • Multilingual OCR under an MIT license

    Use Cases on Mixpeek

    Index scanned contracts, reports, and slide decks as searchable structured text
    Recover tables and headings for section-aware chunking before retrieval
    Give agents faithful document text instead of raw page images
    High-volume PDF ingestion where per-page OCR is too slow

    Specification

    FrameworkHF
    Organizationbaidu
    FeatureOCR
    Outputtext + bbox
    Modalitiesvideo, image, document
    RetrieverText-in-Image
    Parameters3B
    LicenseMIT
    Downloads/mo758K

    Research Paper

    Unlimited-OCR (Baidu)

    arxiv.org

    Build a pipeline with Unlimited-OCR

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio