trocr-large-printed
by microsoft
Transformer-based OCR for printed text recognition
microsoft/trocr-large-printedmixpeek://image_extractor@v1/microsoft_trocr_large_v1Overview
TrOCR is an end-to-end text recognition model that uses a pre-trained image Transformer (DeiT) as the encoder and a pre-trained language model (RoBERTa) as the decoder. The large variant achieves state-of-the-art on printed text benchmarks.
On Mixpeek, TrOCR extracts readable text from images and video frames, making text-in-image content searchable through natural language queries.
Architecture
Encoder-decoder transformer: DeiT-Large (24 layers) as image encoder, RoBERTa-Large (24 layers) as text decoder. Pre-trained on large-scale synthetic printed text data, fine-tuned on SROIE and IAM datasets.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "my-collection",source: { url: "https://example.com/document.pdf" },feature_extractors: [{name: "ocr",version: "v1",params: {model_id: "microsoft/trocr-large-printed"}}]});
Capabilities
- High-accuracy printed text recognition
- End-to-end pipeline (no separate detection step)
- Multi-line text extraction
- Robust to noise, blur, and varying fonts
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| SROIE (text recognition) | Word Accuracy | 96.1% | Li et al., 2023 — Table 3 |
| IAM Handwritten | CER | 3.4% | Li et al., 2023 — Table 2 |
Performance
Common Pipeline Companions
Specification
Research Paper
TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
arxiv.orgBuild a pipeline with trocr-large-printed
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Pipeline Builder