Mixpeek Logo
    Models/Document Analysis/microsoft/layoutlmv3-base
    HFDocument Structurecc-by-nc-sa-4.0

    layoutlmv3-base

    by microsoft

    Pre-trained multimodal transformer for document AI

    1.0Mdl/month
    474likes
    125Mparams
    Identifiers
    Model ID
    microsoft/layoutlmv3-base
    Feature URI
    mixpeek://document_extractor@v1/microsoft_layoutlmv3_v1

    Overview

    LayoutLMv3 is a pre-trained multimodal transformer that jointly models text, layout (bounding boxes), and image information for document understanding. It achieves state-of-the-art on form understanding, receipt extraction, and document classification.

    On Mixpeek, LayoutLMv3 extracts document structure — identifying headings, paragraphs, tables, and their spatial relationships for structured retrieval.

    Architecture

    Unified multimodal transformer that takes text tokens, spatial layout coordinates, and image patches as input. Pre-trained with Masked Language Modeling, Masked Image Modeling, and Word-Patch Alignment objectives.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    
    await mx.collections.ingest({
      collection_id: "my-collection",
      source: { url: "https://example.com/invoice.pdf" },
      feature_extractors: [{
        name: "document_structure",
        version: "v1",
        params: {
          model_id: "microsoft/layoutlmv3-base"
        }
      }]
    });

    Capabilities

    • Document layout understanding
    • Form and receipt key-value extraction
    • Document classification
    • Named entity recognition on documents

    Use Cases on Mixpeek

    Intelligent document processing — extract fields from forms
    Financial document analysis — parse invoices and statements
    Legal document structure extraction

    Specification

    FrameworkHF
    Organizationmicrosoft
    FeatureDocument Structure
    Outputstructure tokens
    Modalitiesdocument
    RetrieverSection Filter
    Parameters125M
    Licensecc-by-nc-sa-4.0
    Downloads/mo1.0M
    Likes474

    Research Paper

    LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

    arxiv.org

    Build a pipeline with layoutlmv3-base

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Pipeline Builder