paddleocr
by PaddlePaddle
Ultra-lightweight, production-ready multilingual OCR system
PaddlePaddle/paddleocrmixpeek://image_extractor@v1/paddle_ocr_v1Overview
PaddleOCR is a comprehensive OCR toolkit supporting 80+ languages with extremely lightweight models suitable for both server and mobile deployment. It combines text detection (DB), text direction classification, and text recognition (CRNN) in a unified pipeline.
On Mixpeek, PaddleOCR is the go-to choice for multilingual text extraction and high-throughput OCR processing of documents, images, and video frames.
Architecture
Three-stage pipeline: (1) DB text detector for localizing text regions, (2) text direction classifier, (3) CRNN-based text recognizer. PP-OCRv4 variant uses knowledge distillation for 4x smaller model with minimal accuracy loss.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";
const mx = new Mixpeek({ apiKey: "API_KEY" });
// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
namespace_id: "my-namespace",
collection_name: "my-collection",
source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
feature_extractor: {
feature_extractor_name: "ocr",
version: "v1",
parameters: { model_id: "PaddlePaddle/paddleocr" },
},
});Capabilities
- 80+ language support including CJK, Arabic, Devanagari
- Text detection, recognition, and layout analysis
- Ultra-lightweight models (< 10MB for mobile)
- Table recognition and key-value extraction
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| ICDAR 2015 (detection) | F1 | 87.1% | PaddleOCR benchmarks — README |
| ICDAR 2015 (recognition) | Accuracy | 79.4% | PaddleOCR benchmarks — README |
Performance
Includes detection + recognition pipeline
Common Pipeline Companions
Explore on Mixpeek
Compare alternatives in this category
Hand-picked tools & platforms compared
Deep-dive technical guide
See how Mixpeek runs models as extractors
Store & search embeddings at scale
Usage-based pricing for pipelines
Compare models, APIs & infrastructure
Specification
Build a pipeline with paddleocr
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio