layoutlmv3-base

by microsoft

Pre-trained multimodal transformer for document AI

2.0Mdl/month

499likes

125Mparams

HuggingFace Use in Pipeline

Identifiers

Model ID

microsoft/layoutlmv3-base

Feature URI

mixpeek://document_extractor@v1/microsoft_layoutlmv3_v1

Overview

LayoutLMv3 is a pre-trained multimodal transformer that jointly models text, layout (bounding boxes), and image information for document understanding. It achieves state-of-the-art on form understanding, receipt extraction, and document classification.

On Mixpeek, LayoutLMv3 extracts document structure, identifying headings, paragraphs, tables, and their spatial relationships for structured retrieval.

Architecture

Unified multimodal transformer that takes text tokens, spatial layout coordinates, and image patches as input. Pre-trained with Masked Language Modeling, Masked Image Modeling, and Word-Patch Alignment objectives.

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
  namespace_id: "my-namespace",
  collection_name: "my-collection",
  source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
  feature_extractor: {
    feature_extractor_name: "document_structure",
    version: "v1",
    parameters: { model_id: "microsoft/layoutlmv3-base" },
  },
});