granite-docling-258M
by ibm-granite
Ultra-compact 258M document converter — layout, tables, formulas, and code in a single model
ibm-granite/granite-docling-258Mmixpeek://document_extractor@v1/ibm_granite_docling_258m_v1Overview
Granite-Docling-258M is IBM's ultra-compact vision-language model for end-to-end document conversion to machine-readable formats. Built on the Idefics3 architecture with a SigLIP2-base-patch16-512 vision encoder and a Granite 165M language model, it converts document pages into DocTags — IBM's universal markup format that captures all page elements including charts, tables, forms, code, equations, footnotes, and their spatial relationships.
At just 258M parameters, Granite-Docling rivals systems several times its size on layout detection (mAP 0.27), full-page OCR (F1 0.84), table recognition (TEDS 0.96), and equation recognition (F1 0.968). On Mixpeek, it provides the most cost-effective document structure extraction, converting scanned PDFs, contracts, and technical documents into structured, searchable content with full layout preservation.
Architecture
Idefics3 architecture with SigLIP2-base-patch16-512 vision encoder and Granite 165M LLM. Outputs DocTags markup format describing all page elements and their spatial relationships. Experimental support for Japanese, Arabic, and Chinese. English is the primary target language.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "documents",source: { url: "https://example.com/technical-report.pdf" },feature_extractors: [{feature: "document_structure",model: "ibm-granite/granite-docling-258M"}]});
Capabilities
- Layout-preserving document-to-markup conversion
- Table recognition (TEDS 0.96 on FinTabNet)
- Equation recognition (F1 0.968) and code recognition (F1 0.988)
- DocTags universal format for structured output
- Ultra-compact 258M parameters — 3x smaller than SmolDocling
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| FinTabNet (table recognition) | TEDS (structure + content) | 0.96 | IBM, 2025 — Granite-Docling Announcement |
| Full-page OCR | F1 | 0.84 | IBM, 2025 — Granite-Docling Announcement |
| Equation recognition | F1 | 0.968 | IBM, 2025 — Granite-Docling Announcement |
Performance
Specification
Research Paper
Granite-Docling: End-to-End Document Understanding
arxiv.orgBuild a pipeline with granite-docling-258M
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio