NEWWhy single embeddings fail for video.Read the post →
    Models/Document Analysis/docling-project/SmolDocling-256M-preview
    HFDocument StructureApache-2.0

    SmolDocling-256M-preview

    by docling-project

    256M-parameter document understanding -- OCR, layout, tables, and charts in one tiny model

    520Kdl/month
    256Mparams
    Identifiers
    Model ID
    docling-project/SmolDocling-256M-preview
    Feature URI
    mixpeek://document_extractor@v1/docling_smoldocling_256m_v1

    Overview

    SmolDocling is a collaboration between IBM Research and HuggingFace that delivers end-to-end document conversion in a model small enough to run on a laptop CPU. At 256M parameters, it handles OCR, layout analysis, table extraction, chart understanding, code blocks, equations, and form parsing.

    The model outputs universal DocTags markup that preserves spatial layout and reading order. It processes entire pages in a single forward pass rather than requiring separate detection and recognition stages. On Mixpeek, SmolDocling provides a cost-effective option for document understanding when GPU resources are scarce or when processing volume makes larger models prohibitively expensive.

    Architecture

    Vision-language model, 256M parameters. End-to-end: image input, DocTags markup output preserving layout and position coordinates. Handles multi-page documents. No external OCR dependency.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    await mx.collections.ingest({
    collection_id: "my-collection",
    source: { url: "https://example.com/contract.pdf" },
    feature_extractors: [{
    name: "document_structure",
    version: "v1",
    params: {
    model_id: "docling-project/SmolDocling-256M-preview"
    }
    }]
    });

    Capabilities

    • Full-page OCR with layout preservation
    • Table structure extraction
    • Chart and figure understanding
    • Code block detection
    • Mathematical equation parsing
    • Form field extraction

    Use Cases on Mixpeek

    High-volume document processing on CPU
    On-device document understanding for privacy-sensitive workflows
    Cost-effective PDF-to-structured-data pipelines
    Document ingestion for RAG at scale

    Benchmarks

    DatasetMetricScoreSource
    DocLayNet (layout detection)mAP76.3%IBM/HF, 2025 -- Paper Table 1

    Performance

    Input SizeFull document pages
    GPU Latency~80ms / page (A100)
    GPU Throughput~12 pages/sec (A100, batch 4)
    GPU Memory~0.8 GB

    Specification

    FrameworkHF
    Organizationdocling-project
    FeatureDocument Structure
    Outputstructure tokens
    Modalitiesdocument
    RetrieverSection Filter
    Parameters256M
    LicenseApache-2.0
    Downloads/mo520K

    Research Paper

    SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

    arxiv.org

    Build a pipeline with SmolDocling-256M-preview

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio