granite-docling-258M

by ibm-granite

Ultra-compact 258M document converter: layout, tables, formulas, and code in a single model

150Kdl/month

258Mparams

HuggingFace Run on your data, free

Identifiers

Model ID

ibm-granite/granite-docling-258M

Feature URI

mixpeek://document_extractor@v1/ibm_granite_docling_258m_v1

Overview

Granite-Docling-258M is IBM's ultra-compact vision-language model for end-to-end document conversion to machine-readable formats. Built on the Idefics3 architecture with a SigLIP2-base-patch16-512 vision encoder and a Granite 165M language model, it converts document pages into DocTags: IBM's universal markup format that captures all page elements including charts, tables, forms, code, equations, footnotes, and their spatial relationships.

At just 258M parameters, Granite-Docling rivals systems several times its size on layout detection (mAP 0.27), full-page OCR (F1 0.84), table recognition (TEDS 0.96), and equation recognition (F1 0.968). On Mixpeek, it provides the most cost-effective document structure extraction, converting scanned PDFs, contracts, and technical documents into structured, searchable content with full layout preservation.

Architecture

Idefics3 architecture with SigLIP2-base-patch16-512 vision encoder and Granite 165M LLM. Outputs DocTags markup format describing all page elements and their spatial relationships. Experimental support for Japanese, Arabic, and Chinese. English is the primary target language.

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
  namespace_id: "my-namespace",
  collection_name: "my-collection",
  source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
  feature_extractor: {
    feature_extractor_name: "document_structure",
    version: "v1",
    parameters: { model_id: "ibm-granite/granite-docling-258M" },
  },
});

Capabilities

Layout-preserving document-to-markup conversion
Table recognition (TEDS 0.96 on FinTabNet)
Equation recognition (F1 0.968) and code recognition (F1 0.988)
DocTags universal format for structured output
Ultra-compact 258M parameters: 3x smaller than SmolDocling

Use Cases on Mixpeek

Contract processing: extract clauses, tables, and structured data while preserving layout

Technical documentation: convert manuals with code blocks, equations, and diagrams

Financial document extraction: parse statements, reports, and filings into structured data

Benchmarks

Dataset	Metric	Score	Source
FinTabNet (table recognition)	TEDS (structure + content)	0.96	IBM, 2025: Granite-Docling Announcement
Full-page OCR	F1	0.84	IBM, 2025: Granite-Docling Announcement
Equation recognition	F1	0.968	IBM, 2025: Granite-Docling Announcement