Mixpeek Logo
    Models/Text Extraction/microsoft/trocr-large-printed
    HFOCRMIT

    trocr-large-printed

    by microsoft

    Transformer-based OCR for printed text recognition

    759Kdl/month
    180likes
    608Mparams
    Identifiers
    Model ID
    microsoft/trocr-large-printed
    Feature URI
    mixpeek://image_extractor@v1/microsoft_trocr_large_v1

    Overview

    TrOCR is an end-to-end text recognition model that uses a pre-trained image Transformer (DeiT) as the encoder and a pre-trained language model (RoBERTa) as the decoder. The large variant achieves state-of-the-art on printed text benchmarks.

    On Mixpeek, TrOCR extracts readable text from images and video frames, making text-in-image content searchable through natural language queries.

    Architecture

    Encoder-decoder transformer: DeiT-Large (24 layers) as image encoder, RoBERTa-Large (24 layers) as text decoder. Pre-trained on large-scale synthetic printed text data, fine-tuned on SROIE and IAM datasets.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    
    await mx.collections.ingest({
      collection_id: "my-collection",
      source: { url: "https://example.com/document.pdf" },
      feature_extractors: [{
        name: "ocr",
        version: "v1",
        params: {
          model_id: "microsoft/trocr-large-printed"
        }
      }]
    });

    Capabilities

    • High-accuracy printed text recognition
    • End-to-end pipeline (no separate detection step)
    • Multi-line text extraction
    • Robust to noise, blur, and varying fonts

    Use Cases on Mixpeek

    Extract text from video overlays, subtitles, and signage
    Digitize scanned documents and receipts
    Search text-in-image content across media libraries

    Specification

    FrameworkHF
    Organizationmicrosoft
    FeatureOCR
    Outputtext + bbox
    Modalitiesvideo, image, document
    RetrieverText-in-Image
    Parameters608M
    LicenseMIT
    Downloads/mo759K
    Likes180

    Research Paper

    TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models

    arxiv.org

    Build a pipeline with trocr-large-printed

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Pipeline Builder

    Alternative Models