NEWManaged multimodal retrieval.Explore platform →
    Models/Text Extraction/tiiuae/Falcon-OCR
    HFOCRTII Falcon License 2.0

    Falcon-OCR

    by tiiuae

    300M early-fusion OCR model — plain text, LaTeX, and HTML table output from document images

    195Kdl/month
    300Mparams
    Identifiers
    Model ID
    tiiuae/Falcon-OCR
    Feature URI
    mixpeek://image_extractor@v1/tiiuae_falcon_ocr_v1

    Overview

    Falcon-OCR is an ultra-compact 300M-parameter early-fusion vision-language model for document OCR, developed by the Technology Innovation Institute (TII). Unlike traditional OCR pipelines that chain detection, recognition, and layout analysis, Falcon-OCR processes image patches and text tokens in a shared parameter space from the very first transformer layer, using a hybrid attention mask where image tokens attend bidirectionally while text tokens decode causally conditioned on the image.

    At just 300M parameters, Falcon-OCR is roughly 3x smaller than competing VLM-based OCR models yet achieves 80.3% on the olmOCR benchmark and 88.64 overall on OmniDocBench. On Mixpeek, it provides fast, lightweight OCR extraction from scanned documents, receipts, and printed materials, producing plain text, LaTeX for formulas, or HTML for tables depending on the requested output format.

    Architecture

    Early-fusion dense autoregressive Transformer. A single transformer processes image patches and text tokens in a shared parameter space from layer 1. Hybrid attention mask: image tokens attend bidirectionally, text tokens decode causally conditioned on image. Requires PyTorch 2.5+ for FlexAttention.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    await mx.collections.ingest({
    collection_id: "documents",
    source: { url: "https://example.com/scanned-report.pdf" },
    feature_extractors: [{
    feature: "ocr",
    model: "tiiuae/Falcon-OCR"
    }]
    });

    Capabilities

    • Plain text, LaTeX formula, and HTML table output modes
    • Early-fusion architecture — no separate vision encoder
    • 88.64 overall on OmniDocBench at just 300M params
    • ~2.9 images/sec on a single A100-80GB
    • 3x smaller than competing VLM-OCR models

    Use Cases on Mixpeek

    Document digitization: extract text from scanned archives with minimal compute
    Formula extraction: convert mathematical content to LaTeX for searchable indexing
    Table extraction: produce structured HTML from document tables for downstream filtering

    Benchmarks

    DatasetMetricScoreSource
    olmOCRAccuracy80.3%TII, 2026 — Falcon Perception Paper
    OmniDocBenchOverall88.64TII, 2026 — Falcon Perception Paper

    Performance

    Input SizeDocument page image (variable resolution)
    GPU Latency~345ms / image (A100, 5825 tok/s)
    GPU Throughput~2.9 images/sec (A100-80GB)
    GPU Memory~1.2 GB

    Specification

    FrameworkHF
    Organizationtiiuae
    FeatureOCR
    Outputtext + bbox
    Modalitiesvideo, image, document
    RetrieverText-in-Image
    Parameters300M
    LicenseTII Falcon License 2.0
    Downloads/mo195K

    Research Paper

    Falcon Perception

    arxiv.org

    Build a pipeline with Falcon-OCR

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio