SmolDocling-256M-preview

by docling-project

256M-parameter document understanding -- OCR, layout, tables, and charts in one tiny model

520Kdl/month

256Mparams

HuggingFace Run on your data, free

Identifiers

Model ID

docling-project/SmolDocling-256M-preview

Feature URI

mixpeek://document_extractor@v1/docling_smoldocling_256m_v1

Overview

SmolDocling is a collaboration between IBM Research and HuggingFace that delivers end-to-end document conversion in a model small enough to run on a laptop CPU. At 256M parameters, it handles OCR, layout analysis, table extraction, chart understanding, code blocks, equations, and form parsing.

The model outputs universal DocTags markup that preserves spatial layout and reading order. It processes entire pages in a single forward pass rather than requiring separate detection and recognition stages. On Mixpeek, SmolDocling provides a cost-effective option for document understanding when GPU resources are scarce or when processing volume makes larger models prohibitively expensive.

Architecture

Vision-language model, 256M parameters. End-to-end: image input, DocTags markup output preserving layout and position coordinates. Handles multi-page documents. No external OCR dependency.

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
  namespace_id: "my-namespace",
  collection_name: "my-collection",
  source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
  feature_extractor: {
    feature_extractor_name: "document_structure",
    version: "v1",
    parameters: { model_id: "docling-project/SmolDocling-256M-preview" },
  },
});