Does it support handwritten text?

Yes. The vision-language model component is trained on handwritten samples in multiple scripts. Accuracy varies with legibility but typically achieves 85-95% character accuracy on clear handwriting.

Can I get bounding box coordinates for each text region?

Yes. Set `include_regions` to true and each text block will include pixel-level bounding box coordinates, enabling overlay rendering or targeted redaction.

What languages are supported for OCR?

Over 100 languages and scripts are supported, including Latin, Cyrillic, Arabic, CJK, Devanagari, Thai, and Korean. Multi-language documents are handled automatically.

media

Image
Text
Converter

Extract all readable text from images using advanced OCR combined with a vision-language model. Handles printed text, handwriting, complex layouts, receipts, signs, and multi-language documents.

Max file size: 50 MB

Estimated: 1-5 sec per image

6 input formats

How It Works

Upload an image or provide a URL.

The image is preprocessed (deskew, contrast normalization).

OCR detects text regions and extracts character-level output.

A vision-language model refines extraction and resolves ambiguities.

Structured text with bounding boxes and confidence scores is returned.

Code Examples

from mixpeek import Mixpeek

client = Mixpeek(api_key="YOUR_API_KEY")

result = client.convert(
    source="https://example.com/receipt.jpg",
    from_format="image",
    to_format="text",
    options={
        "include_regions": True,
        "language_hint": "en"
    }
)

print(result.text)

Use Cases

Digitize scanned documents and forms

Extract text from product labels and packaging photos

Read text overlays in screenshots and social media images

Process handwritten notes and whiteboard photos

Supported Input Formats

JPEG

PNG

WebP

TIFF

BMP

GIF

Quick Info

Categorymedia

Max File Size50 MB

Est. Time1-5 sec per image

Extractorimage-descriptor

Try This Conversion

Get started with the Mixpeek API and convert your first file in minutes.

Frequently Asked Questions

Related Converters

Image

Embeddings

Image to Embeddings

Convert images into dense vector representations using state-of-the-art vision models. Embeddings capture semantic visual features and can be used for similarity search, clustering, and cross-modal retrieval.

Image

Caption

Image to Caption

Generate natural-language captions for images using a vision-language model. Produces concise, descriptive sentences suitable for alt text, content indexing, and accessibility compliance.

Image

Description

Image to Description

Generate rich, multi-sentence descriptions of images covering composition, subjects, colors, mood, and context. Ideal for detailed content cataloging, creative writing prompts, and advanced search indexing.

PDF

Text

PDF to Text

Extract clean, structured text from PDF documents including scanned pages, multi-column layouts, headers/footers, and tables. Combines traditional parsing with OCR and layout analysis for maximum accuracy.

Ready to convert image to text?

Start using the Mixpeek Image to Text in minutes. Sign up for a free API key and follow the documentation to get started.

ImageTextConverter

How It Works

Code Examples

Use Cases

Supported Input Formats

Quick Info

Try This Conversion

Frequently Asked Questions

Related Converters

Image to Embeddings

Image to Caption

Image to Description

PDF to Text

Ready to convert image to text?

Image
Text
Converter