NEWManaged multimodal retrieval pipelines for data on your storage.Managed multimodal retrieval.Explore platform →

Ingest & Store

Feature Extractors

Typed pipelines for faces, scenes, transcripts, OCR, fingerprints.

Vector Store (MVS)

Mixpeek Vector Store: horizontally scaled, feature-aware indexes.

Retrieve & Analyze

Compose multi-stage search in <100ms:filter, join, rerank.

Group scenes, faces or objects by similarity with Thompson sampling.

Encode your domain as versioned ontologies enforced at query time.

By Industry

Talent search, brand safety, creative analytics.

Scene search, recommendation, archive access.

Visual search, PDP enrichment, catalog QA.

Lecture search, transcript Q&A, content safety.

View all solutions →

By Use Case

Face & Person Search

Find anyone across video libraries in milliseconds.

IP & Copyright Detection

Logos, songs, faces:one pipeline, one report.

Visual Taste & Recs

Scene-similarity ranked recommendations with RL.

Brand & Ad Safety

Pre-publish content screening at bid-time speeds.

View all use cases →

Build

API reference, SDKs, recipes, and architecture guides.

Launches, deep dives, and field notes from our engineers.

Browse supported HuggingFace models by task and modality.

See what teams are building with Mixpeek.

Education

Multimodal University

Fundamentals of multimodal retrieval, modules + certs.

Every term you need:embeddings to re-rankers.

Talks, demos, and customer sessions on demand.

Mixpeek vs. Pinecone, Weaviate, Twelve Labs, more.

Mission, team, and the multimodal vision.

We're hiring across research, infra, and design.

Talk to sales, support, or press.

White-glove 30-day production pilot for new customers.

Vector Store Integrations Pricing

Sign in Request Demo Get started →

Models/Text Extraction/rednote-hilab/dots.ocr-1.5

HFOCRApache 2.0

dots.ocr-1.5

by rednote-hilab

Multilingual document parsing — 100+ languages, unified layout + recognition

320Kdl/month

1.7Bparams

HuggingFace Use in Pipeline

Identifiers

Model ID

rednote-hilab/dots.ocr-1.5

Feature URI

mixpeek://image_extractor@v1/rednote_dots_ocr_15_v1

Overview

dots.ocr-1.5 is a unified document parsing model from Xiaohongshu (RedNote) that combines layout detection and content recognition in a single model. It supports 100+ languages and handles academic papers, financial reports, tables, and multilingual content.

Task switching via prompt alone means no pipeline reconfiguration — the same model handles layout analysis, text extraction, and table parsing depending on the instruction.

Architecture

1.7B parameter model. Unified architecture that performs layout detection and OCR in a single forward pass. Prompt-based task switching for different document understanding modes.

Mixpeek SDK Integration

mixpeek.ingest.from_url(
    url="s3://documents/report.pdf",
    collection="documents",
    feature_extractors=[{
        "type": "ocr",
        "model": "mixpeek://image_extractor@v1/rednote_dots_ocr_15_v1"
    }]
)

Capabilities

Multilingual OCR (100+ languages)
Layout detection
Table extraction
Academic paper parsing
Financial document processing

Use Cases on Mixpeek

Multilingual document search

International content indexing

Financial document extraction

Academic paper processing

Performance

Input SizeVariable

GPU Latency~80ms per page (A100)

GPU Throughput~12 pages/sec

GPU MemoryModel dependent

Common Pipeline Companions

Common companion model

microsoft/layoutlmv3-base

Common companion model

Specification

FrameworkHF

Organizationrednote-hilab

FeatureOCR

Outputtext + bbox

Modalitiesvideo, image, document

RetrieverText-in-Image

Parameters1.7B

LicenseApache 2.0

Downloads/mo320K

Build a pipeline with dots.ocr-1.5

Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

Alternative Models

microsoft/trocr-large-printed

PaddlePaddle/paddleocr

zai-org/GLM-OCR

lightonai/LightOnOCR-2-1B

Related in Text Extraction

microsoft/codebert-base

Code Extraction

Salesforce/codet5p-110m-embedding

Code Extraction