NEWWhy single embeddings fail for video.Read the post →
    Models/Document Analysis/ibm-granite/granite-docling-258M
    HFDocument StructureApache-2.0

    granite-docling-258M

    by ibm-granite

    Ultra-compact 258M document converter — layout, tables, formulas, and code in a single model

    150Kdl/month
    258Mparams
    Identifiers
    Model ID
    ibm-granite/granite-docling-258M
    Feature URI
    mixpeek://document_extractor@v1/ibm_granite_docling_258m_v1

    Overview

    Granite-Docling-258M is IBM's ultra-compact vision-language model for end-to-end document conversion to machine-readable formats. Built on the Idefics3 architecture with a SigLIP2-base-patch16-512 vision encoder and a Granite 165M language model, it converts document pages into DocTags — IBM's universal markup format that captures all page elements including charts, tables, forms, code, equations, footnotes, and their spatial relationships.

    At just 258M parameters, Granite-Docling rivals systems several times its size on layout detection (mAP 0.27), full-page OCR (F1 0.84), table recognition (TEDS 0.96), and equation recognition (F1 0.968). On Mixpeek, it provides the most cost-effective document structure extraction, converting scanned PDFs, contracts, and technical documents into structured, searchable content with full layout preservation.

    Architecture

    Idefics3 architecture with SigLIP2-base-patch16-512 vision encoder and Granite 165M LLM. Outputs DocTags markup format describing all page elements and their spatial relationships. Experimental support for Japanese, Arabic, and Chinese. English is the primary target language.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    await mx.collections.ingest({
    collection_id: "documents",
    source: { url: "https://example.com/technical-report.pdf" },
    feature_extractors: [{
    feature: "document_structure",
    model: "ibm-granite/granite-docling-258M"
    }]
    });

    Capabilities

    • Layout-preserving document-to-markup conversion
    • Table recognition (TEDS 0.96 on FinTabNet)
    • Equation recognition (F1 0.968) and code recognition (F1 0.988)
    • DocTags universal format for structured output
    • Ultra-compact 258M parameters — 3x smaller than SmolDocling

    Use Cases on Mixpeek

    Contract processing: extract clauses, tables, and structured data while preserving layout
    Technical documentation: convert manuals with code blocks, equations, and diagrams
    Financial document extraction: parse statements, reports, and filings into structured data

    Benchmarks

    DatasetMetricScoreSource
    FinTabNet (table recognition)TEDS (structure + content)0.96IBM, 2025 — Granite-Docling Announcement
    Full-page OCRF10.84IBM, 2025 — Granite-Docling Announcement
    Equation recognitionF10.968IBM, 2025 — Granite-Docling Announcement

    Performance

    Input SizeFull document page (512px patches)
    GPU Latency~65ms / page (A100)
    GPU Throughput~15 pages/sec (A100, batch 4)
    GPU Memory~0.7 GB

    Specification

    FrameworkHF
    Organizationibm-granite
    FeatureDocument Structure
    Outputstructure tokens
    Modalitiesdocument
    RetrieverSection Filter
    Parameters258M
    LicenseApache-2.0
    Downloads/mo150K

    Research Paper

    Granite-Docling: End-to-End Document Understanding

    arxiv.org

    Build a pipeline with granite-docling-258M

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio