SmolDocling-256M-preview
by docling-project
256M-parameter document understanding -- OCR, layout, tables, and charts in one tiny model
docling-project/SmolDocling-256M-previewmixpeek://document_extractor@v1/docling_smoldocling_256m_v1Overview
SmolDocling is a collaboration between IBM Research and HuggingFace that delivers end-to-end document conversion in a model small enough to run on a laptop CPU. At 256M parameters, it handles OCR, layout analysis, table extraction, chart understanding, code blocks, equations, and form parsing.
The model outputs universal DocTags markup that preserves spatial layout and reading order. It processes entire pages in a single forward pass rather than requiring separate detection and recognition stages. On Mixpeek, SmolDocling provides a cost-effective option for document understanding when GPU resources are scarce or when processing volume makes larger models prohibitively expensive.
Architecture
Vision-language model, 256M parameters. End-to-end: image input, DocTags markup output preserving layout and position coordinates. Handles multi-page documents. No external OCR dependency.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "my-collection",source: { url: "https://example.com/contract.pdf" },feature_extractors: [{name: "document_structure",version: "v1",params: {model_id: "docling-project/SmolDocling-256M-preview"}}]});
Capabilities
- Full-page OCR with layout preservation
- Table structure extraction
- Chart and figure understanding
- Code block detection
- Mathematical equation parsing
- Form field extraction
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| DocLayNet (layout detection) | mAP | 76.3% | IBM/HF, 2025 -- Paper Table 1 |
Performance
Specification
Research Paper
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
arxiv.orgBuild a pipeline with SmolDocling-256M-preview
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio