Dolphin-v2
by ByteDance
End-to-end document parsing VLM — 21 element types, pixel-accurate layout
ByteDance/Dolphin-v2mixpeek://image_extractor@v1/bytedance_dolphin_v2Overview
Dolphin v2 is ByteDance's visual document parsing model that classifies and extracts 21 element categories from both digital and photographed documents: text blocks, tables, formulas, figures, code blocks, headers, footers, captions, and more. Built on a Qwen2.5-VL-3B backbone, it processes document pages end-to-end without a separate OCR pipeline.
It scores 89.45 on OmniDocBench V1.5 overall, with standout performance on tables (TEDS: 90.48) and formulas (CDM: 86.72). The key advance over v1 is absolute pixel-coordinate spatial localization -- every extracted element comes with precise bounding box coordinates. On Mixpeek, Dolphin v2 powers structured document extraction for RAG pipelines that need to understand document layout, not just raw text.
Architecture
Qwen2.5-VL-3B vision-language backbone fine-tuned for document parsing. Processes pages at native resolution with adaptive tiling. Outputs structured JSON with element type, text content, and absolute pixel-coordinate bounding boxes for each of 21 element categories.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "my-collection",source: { url: "https://example.com/financial-report.pdf" },feature_extractors: [{name: "ocr",version: "v1",params: {model_id: "ByteDance/Dolphin-v2",extract_layout: true}}]});
Capabilities
- 21 document element categories (text, tables, formulas, figures, code, etc.)
- Pixel-accurate bounding box localization
- Tables with structure preservation (TEDS: 90.48)
- Formula recognition (CDM: 86.72)
- MIT license, 3B parameters
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| OmniDocBench V1.5 (overall) | Score | 89.45 | ByteDance, 2026 — Model Card |
| OmniDocBench V1.5 (tables) | TEDS | 90.48 | ByteDance, 2026 — Model Card |
| OmniDocBench V1.5 (formulas) | CDM | 86.72 | ByteDance, 2026 — Model Card |
Performance
Specification
Research Paper
Dolphin: A Document Parsing Model
arxiv.orgBuild a pipeline with Dolphin-v2
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio