trocr-large-printed
by microsoft
Transformer-based OCR for printed text recognition
microsoft/trocr-large-printedmixpeek://image_extractor@v1/microsoft_trocr_large_v1Overview
TrOCR is an end-to-end text recognition model that uses a pre-trained image Transformer (DeiT) as the encoder and a pre-trained language model (RoBERTa) as the decoder. The large variant achieves state-of-the-art on printed text benchmarks.
On Mixpeek, TrOCR extracts readable text from images and video frames, making text-in-image content searchable through natural language queries.
Architecture
Encoder-decoder transformer: DeiT-Large (24 layers) as image encoder, RoBERTa-Large (24 layers) as text decoder. Pre-trained on large-scale synthetic printed text data, fine-tuned on SROIE and IAM datasets.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";
const mx = new Mixpeek({ apiKey: "API_KEY" });
await mx.collections.ingest({
collection_id: "my-collection",
source: { url: "https://example.com/document.pdf" },
feature_extractors: [{
name: "ocr",
version: "v1",
params: {
model_id: "microsoft/trocr-large-printed"
}
}]
});Capabilities
- High-accuracy printed text recognition
- End-to-end pipeline (no separate detection step)
- Multi-line text extraction
- Robust to noise, blur, and varying fonts
Use Cases on Mixpeek
Specification
Research Paper
TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
arxiv.orgBuild a pipeline with trocr-large-printed
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Pipeline Builder