SmolDocling-256M-preview
by docling-project
256M-parameter document understanding -- OCR, layout, tables, and charts in one tiny model
docling-project/SmolDocling-256M-previewmixpeek://document_extractor@v1/docling_smoldocling_256m_v1Overview
SmolDocling is a collaboration between IBM Research and HuggingFace that delivers end-to-end document conversion in a model small enough to run on a laptop CPU. At 256M parameters, it handles OCR, layout analysis, table extraction, chart understanding, code blocks, equations, and form parsing.
The model outputs universal DocTags markup that preserves spatial layout and reading order. It processes entire pages in a single forward pass rather than requiring separate detection and recognition stages. On Mixpeek, SmolDocling provides a cost-effective option for document understanding when GPU resources are scarce or when processing volume makes larger models prohibitively expensive.
Architecture
Vision-language model, 256M parameters. End-to-end: image input, DocTags markup output preserving layout and position coordinates. Handles multi-page documents. No external OCR dependency.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";
const mx = new Mixpeek({ apiKey: "API_KEY" });
// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
namespace_id: "my-namespace",
collection_name: "my-collection",
source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
feature_extractor: {
feature_extractor_name: "document_structure",
version: "v1",
parameters: { model_id: "docling-project/SmolDocling-256M-preview" },
},
});Capabilities
- Full-page OCR with layout preservation
- Table structure extraction
- Chart and figure understanding
- Code block detection
- Mathematical equation parsing
- Form field extraction
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| DocLayNet (layout detection) | mAP | 76.3% | IBM/HF, 2025 -- Paper Table 1 |
Performance
Common Pipeline Companions
Explore on Mixpeek
Compare alternatives in this category
Hand-picked tools & platforms compared
Deep-dive technical guide
See how Mixpeek runs models as extractors
Store & search embeddings at scale
Usage-based pricing for pipelines
Compare models, APIs & infrastructure
Specification
Research Paper
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
arxiv.orgBuild a pipeline with SmolDocling-256M-preview
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio