Mixpeek Logo
    Models/Text Extraction/PaddlePaddle/paddleocr
    PyTorchOCRApache 2.0

    paddleocr

    by PaddlePaddle

    Ultra-lightweight, production-ready multilingual OCR system

    5.1Mdl/month
    12Mparams
    Identifiers
    Model ID
    PaddlePaddle/paddleocr
    Feature URI
    mixpeek://image_extractor@v1/paddle_ocr_v1

    Overview

    PaddleOCR is a comprehensive OCR toolkit supporting 80+ languages with extremely lightweight models suitable for both server and mobile deployment. It combines text detection (DB), text direction classification, and text recognition (CRNN) in a unified pipeline.

    On Mixpeek, PaddleOCR is the go-to choice for multilingual text extraction and high-throughput OCR processing of documents, images, and video frames.

    Architecture

    Three-stage pipeline: (1) DB text detector for localizing text regions, (2) text direction classifier, (3) CRNN-based text recognizer. PP-OCRv4 variant uses knowledge distillation for 4x smaller model with minimal accuracy loss.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    
    await mx.collections.ingest({
      collection_id: "my-collection",
      source: { url: "https://example.com/document.pdf" },
      feature_extractors: [{
        name: "ocr",
        version: "v1",
        params: {
          model_id: "PaddlePaddle/paddleocr"
        }
      }]
    });

    Capabilities

    • 80+ language support including CJK, Arabic, Devanagari
    • Text detection, recognition, and layout analysis
    • Ultra-lightweight models (< 10MB for mobile)
    • Table recognition and key-value extraction

    Use Cases on Mixpeek

    Multilingual document processing across global content libraries
    High-throughput OCR for large-scale document digitization
    Real-time text extraction from live video feeds

    Specification

    FrameworkPyTorch
    OrganizationPaddlePaddle
    FeatureOCR
    Outputtext + bbox
    Modalitiesvideo, image, document
    RetrieverText-in-Image
    Parameters12M
    LicenseApache 2.0
    Downloads/mo5.1M

    Build a pipeline with paddleocr

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Pipeline Builder