donut-base

by naver-clova-ix

Document understanding transformer, OCR-free document parsing

121Kdl/month

254likes

210Mparams

HuggingFace Run on your data, free

Identifiers

Model ID

naver-clova-ix/donut-base

Feature URI

mixpeek://document_extractor@v1/naver_donut_base_v1

Overview

Donut (Document Understanding Transformer) is an end-to-end model for document understanding that directly maps document images to structured outputs without relying on a separate OCR engine. This simplifies the pipeline and avoids OCR error propagation.

On Mixpeek, Donut offers an OCR-free alternative for document structure extraction, particularly useful for visually rich documents like receipts, forms, and infographics.

Architecture

Swin Transformer encoder for image features, BART decoder for text generation. Trained end-to-end on document images with their corresponding JSON annotations. No OCR dependency.

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
  namespace_id: "my-namespace",
  collection_name: "my-collection",
  source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
  feature_extractor: {
    feature_extractor_name: "document_structure",
    version: "v1",
    parameters: { model_id: "naver-clova-ix/donut-base" },
  },
});