NEWWhy single embeddings fail for video.Read the post →
    Models/Text Extraction/lightonai/LightOnOCR-2-1B
    HFOCRApache-2.0

    LightOnOCR-2-1B

    by lightonai

    State-of-the-art 1B-parameter end-to-end multilingual OCR with bounding box localization

    730Kdl/month
    1Bparams
    Identifiers
    Model ID
    lightonai/LightOnOCR-2-1B
    Feature URI
    mixpeek://image_extractor@v1/lighton_ocr2_1b_v1

    Overview

    LightOnOCR-2-1B is a 1B-parameter vision-language model that sets the top score on OlmOCR-Bench (83.2) while being compact enough for efficient deployment. Built on a native-resolution ViT initialized from Mistral-Small-3.1, it handles page images up to 1540px on the longest edge with particularly strong performance on ArXiv papers, scanned documents with math, and complex tables.

    On Mixpeek, LightOnOCR-2-1B extracts text from documents, scanned pages, and images with high accuracy, powering full-text search across document collections. An image-localization variant adds bounding box predictions without degrading OCR quality.

    Architecture

    Three-component VLM: native-resolution Vision Transformer (initialized from Mistral-Small-3.1) as encoder, multimodal projector, and language model decoder. Accepts page images up to 1540px longest edge. Optional bounding box localization via coordinate tokens introduced during pretraining and refined with RLVR.

    Mixpeek SDK Integration

    from mixpeek import Mixpeek
    mx = Mixpeek(api_key="YOUR_KEY")
    mx.ingest(
    collection_id="scanned-documents",
    source="s3://scans/",
    extractors=[{
    "type": "ocr",
    "model": "lightonai/LightOnOCR-2-1B",
    "output_feature": "extracted_text"
    }]
    )

    Capabilities

    • Top score on OlmOCR-Bench (83.2) among 1B-class models
    • Native-resolution processing up to 1540px
    • Multilingual OCR with strong French and scientific document support
    • Optional bounding box localization variant
    • Apache 2.0 open-source

    Use Cases on Mixpeek

    High-accuracy document digitization for scanned archives and PDFs
    Scientific paper and technical document text extraction
    Multilingual document search across enterprise content libraries

    Benchmarks

    DatasetMetricScoreSource
    OlmOCR-BenchScore83.2LightOn AI, 2025 — LightOnOCR paper
    ArXiv papers subsetScoreBest in classLightOn AI, 2025 — LightOnOCR paper

    Performance

    Input SizeUp to 1540px longest edge
    GPU Latency~25ms / page (A100)
    CPU Latency~320ms / page
    GPU Throughput~40 pages/sec (A100)
    GPU Memory~2.2 GB

    Specification

    FrameworkHF
    Organizationlightonai
    FeatureOCR
    Outputtext + bbox
    Modalitiesvideo, image, document
    RetrieverText-in-Image
    Parameters1B
    LicenseApache-2.0
    Downloads/mo730K

    Research Paper

    LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR

    arxiv.org

    Build a pipeline with LightOnOCR-2-1B

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio