Dolphin-v2

by ByteDance

End-to-end document parsing VLM: 21 element types, pixel-accurate layout

210Kdl/month

~3Bparams

HuggingFace Run on your data

Identifiers

Model ID

ByteDance/Dolphin-v2

Feature URI

mixpeek://image_extractor@v1/bytedance_dolphin_v2

Overview

Dolphin v2 is ByteDance's visual document parsing model that classifies and extracts 21 element categories from both digital and photographed documents: text blocks, tables, formulas, figures, code blocks, headers, footers, captions, and more. Built on a Qwen2.5-VL-3B backbone, it processes document pages end-to-end without a separate OCR pipeline.

It scores 89.45 on OmniDocBench V1.5 overall, with standout performance on tables (TEDS: 90.48) and formulas (CDM: 86.72). The key advance over v1 is absolute pixel-coordinate spatial localization -- every extracted element comes with precise bounding box coordinates. On Mixpeek, Dolphin v2 powers structured document extraction for RAG pipelines that need to understand document layout, not just raw text.

Architecture

Qwen2.5-VL-3B vision-language backbone fine-tuned for document parsing. Processes pages at native resolution with adaptive tiling. Outputs structured JSON with element type, text content, and absolute pixel-coordinate bounding boxes for each of 21 element categories.

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
  namespace_id: "my-namespace",
  collection_name: "my-collection",
  source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
  feature_extractor: {
    feature_extractor_name: "ocr",
    version: "v1",
    parameters: { model_id: "ByteDance/Dolphin-v2" },
  },
});

Capabilities

21 document element categories (text, tables, formulas, figures, code, etc.)
Pixel-accurate bounding box localization
Tables with structure preservation (TEDS: 90.48)
Formula recognition (CDM: 86.72)
MIT license, 3B parameters

Use Cases on Mixpeek

Structured document extraction for enterprise RAG

Table extraction from financial reports and invoices

Formula extraction from scientific papers and textbooks

Layout-aware document indexing for search across mixed-content pages

Benchmarks

Dataset	Metric	Score	Source
OmniDocBench V1.5 (overall)	Score	89.45	ByteDance, 2026: Model Card
OmniDocBench V1.5 (tables)	TEDS	90.48	ByteDance, 2026: Model Card
OmniDocBench V1.5 (formulas)	CDM	86.72	ByteDance, 2026: Model Card