NEWManaged multimodal retrieval.Explore platform →
    Models/Text Extraction/PaddlePaddle/PaddleOCR-VL-1.6
    HFOCRApache 2.0

    PaddleOCR-VL-1.6

    by PaddlePaddle

    Compact document VLM for OCR, tables, formulas, charts, seals, and layout parsing

    3.2Kdl/month
    1.0Bparams
    Identifiers
    Model ID
    PaddlePaddle/PaddleOCR-VL-1.6
    Feature URI
    mixpeek://image_extractor@v1/paddle_ocr_vl_16_v1

    Overview

    PaddleOCR-VL 1.6 is the newest compact document parsing model from PaddlePaddle. It upgrades PaddleOCR-VL 1.5 with region-aware data optimization and progressive post-training, improving weak regions such as tables, rare characters, seals, text spotting, and charts.

    On Mixpeek, PaddleOCR-VL 1.6 is a strong OCR and document decomposition candidate when agents need to search scans, forms, charts, invoices, and multilingual documents as structured evidence.

    Architecture

    0.9B to 1.0B parameter document vision-language model built on the PaddleOCR-VL architecture. Supports task prompts for OCR, table recognition, formula recognition, chart recognition, spotting, and seal recognition. Compatible with the PaddleOCR doc parser pipeline and Transformers custom code.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    await mx.collections.ingest({
    collection_id: "documents",
    source: { url: "https://example.com/invoice.pdf" },
    feature_extractors: [{
    feature: "ocr",
    model: "PaddlePaddle/PaddleOCR-VL-1.6"
    }]
    });

    Capabilities

    • Document parsing across text, tables, formulas, charts, seals, and layout
    • English, Chinese, and multilingual document support
    • OmniDocBench v1.6 score of 96.33 on the model card
    • Compatible migration path from PaddleOCR-VL 1.5

    Use Cases on Mixpeek

    Search scanned business documents by extracted text and layout fields
    Parse invoices, forms, charts, and tables into retrievable metadata
    Give agents page-level evidence from PDFs and screenshots
    Index multilingual archives where OCR and layout both matter

    Benchmarks

    DatasetMetricScoreSource
    OmniDocBench v1.6Overall score96.33%PaddleOCR-VL 1.6 model card

    Performance

    Input SizeDocument page image
    GPU LatencyBackend dependent; PaddleOCR and vLLM server modes supported
    GPU ThroughputBackend dependent; batch by page for best throughput
    GPU Memory~2 GB plus serving overhead

    Use the PaddleOCR doc parser path for page-level parsing

    Specification

    FrameworkHF
    OrganizationPaddlePaddle
    FeatureOCR
    Outputtext + bbox
    Modalitiesvideo, image, document
    RetrieverText-in-Image
    Parameters1.0B
    LicenseApache 2.0
    Downloads/mo3.2K

    Research Paper

    PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing

    arxiv.org

    Build a pipeline with PaddleOCR-VL-1.6

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio