donut-base
by naver-clova-ix
Document understanding transformer — OCR-free document parsing
naver-clova-ix/donut-basemixpeek://document_extractor@v1/naver_donut_base_v1Overview
Donut (Document Understanding Transformer) is an end-to-end model for document understanding that directly maps document images to structured outputs without relying on a separate OCR engine. This simplifies the pipeline and avoids OCR error propagation.
On Mixpeek, Donut offers an OCR-free alternative for document structure extraction, particularly useful for visually rich documents like receipts, forms, and infographics.
Architecture
Swin Transformer encoder for image features, BART decoder for text generation. Trained end-to-end on document images with their corresponding JSON annotations. No OCR dependency.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";
const mx = new Mixpeek({ apiKey: "API_KEY" });
await mx.collections.ingest({
collection_id: "my-collection",
source: { url: "https://example.com/receipt.jpg" },
feature_extractors: [{
name: "document_structure",
version: "v1",
params: {
model_id: "naver-clova-ix/donut-base"
}
}]
});Capabilities
- OCR-free document understanding
- Structured JSON output from document images
- Document classification
- Key-value extraction from forms
Use Cases on Mixpeek
Specification
Research Paper
OCR-free Document Understanding Transformer
arxiv.orgBuild a pipeline with donut-base
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Pipeline Builder