NEWManaged multimodal retrieval pipelines for data on your storage.Managed multimodal retrieval.Explore platform →

Ingest & Store

Feature Extractors

Typed pipelines for faces, scenes, transcripts, OCR, fingerprints.

Vector Store (MVS)

Mixpeek Vector Store: horizontally scaled, feature-aware indexes.

Retrieve & Analyze

Compose multi-stage search in <100ms:filter, join, rerank.

Group scenes, faces or objects by similarity with Thompson sampling.

Encode your domain as versioned ontologies enforced at query time.

By Industry

Talent search, brand safety, creative analytics.

Scene search, recommendation, archive access.

Visual search, PDP enrichment, catalog QA.

Lecture search, transcript Q&A, content safety.

View all solutions →

By Use Case

Face & Person Search

Find anyone across video libraries in milliseconds.

IP & Copyright Detection

Logos, songs, faces:one pipeline, one report.

Visual Taste & Recs

Scene-similarity ranked recommendations with RL.

Brand & Ad Safety

Pre-publish content screening at bid-time speeds.

View all use cases →

Build

API reference, SDKs, recipes, and architecture guides.

Launches, deep dives, and field notes from our engineers.

Browse supported HuggingFace models by task and modality.

See what teams are building with Mixpeek.

Education

Multimodal University

Fundamentals of multimodal retrieval, modules + certs.

Every term you need:embeddings to re-rankers.

Talks, demos, and customer sessions on demand.

Mixpeek vs. Pinecone, Weaviate, Twelve Labs, more.

Mission, team, and the multimodal vision.

We're hiring across research, infra, and design.

Talk to sales, support, or press.

White-glove 30-day production pilot for new customers.

Vector Store Integrations Pricing

Sign in Request Demo Get started →

Models/Text Extraction/PaddlePaddle/PaddleOCR-VL-1.6

HFOCRApache 2.0

PaddleOCR-VL-1.6

by PaddlePaddle

Compact document VLM for OCR, tables, formulas, charts, seals, and layout parsing

3.2Kdl/month

1.0Bparams

HuggingFace Use in Pipeline

Identifiers

Model ID

PaddlePaddle/PaddleOCR-VL-1.6

Feature URI

mixpeek://image_extractor@v1/paddle_ocr_vl_16_v1

Overview

PaddleOCR-VL 1.6 is the newest compact document parsing model from PaddlePaddle. It upgrades PaddleOCR-VL 1.5 with region-aware data optimization and progressive post-training, improving weak regions such as tables, rare characters, seals, text spotting, and charts.

On Mixpeek, PaddleOCR-VL 1.6 is a strong OCR and document decomposition candidate when agents need to search scans, forms, charts, invoices, and multilingual documents as structured evidence.

Architecture

0.9B to 1.0B parameter document vision-language model built on the PaddleOCR-VL architecture. Supports task prompts for OCR, table recognition, formula recognition, chart recognition, spotting, and seal recognition. Compatible with the PaddleOCR doc parser pipeline and Transformers custom code.

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

await mx.collections.ingest({
  collection_id: "documents",
  source: { url: "https://example.com/invoice.pdf" },
  feature_extractors: [{
    feature: "ocr",
    model: "PaddlePaddle/PaddleOCR-VL-1.6"
  }]
});

Capabilities

Document parsing across text, tables, formulas, charts, seals, and layout
English, Chinese, and multilingual document support
OmniDocBench v1.6 score of 96.33 on the model card
Compatible migration path from PaddleOCR-VL 1.5

Use Cases on Mixpeek

Search scanned business documents by extracted text and layout fields

Parse invoices, forms, charts, and tables into retrievable metadata

Give agents page-level evidence from PDFs and screenshots

Index multilingual archives where OCR and layout both matter

Benchmarks

Dataset	Metric	Score	Source
OmniDocBench v1.6	Overall score	96.33%	PaddleOCR-VL 1.6 model card

Performance

Input SizeDocument page image

GPU LatencyBackend dependent; PaddleOCR and vLLM server modes supported

GPU ThroughputBackend dependent; batch by page for best throughput

GPU Memory~2 GB plus serving overhead

Use the PaddleOCR doc parser path for page-level parsing

Common Pipeline Companions

BAAI/bge-large-en-v1.5

Embed extracted page text for semantic search

microsoft/layoutlmv3-base

Layout-aware filtering and document structure

Specification

FrameworkHF

OrganizationPaddlePaddle

FeatureOCR

Outputtext + bbox

Modalitiesvideo, image, document

RetrieverText-in-Image

Parameters1.0B

LicenseApache 2.0

Downloads/mo3.2K

Research Paper

PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing

Build a pipeline with PaddleOCR-VL-1.6

Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

Alternative Models

microsoft/trocr-large-printed

PaddlePaddle/paddleocr

zai-org/GLM-OCR

lightonai/LightOnOCR-2-1B

Related in Text Extraction

microsoft/codebert-base

Code Extraction

Salesforce/codet5p-110m-embedding

Code Extraction