Mixpeek Logo
    Login / Signup
    Models/Document Analysis/microsoft/layoutlmv3-base
    HFDocument Structurecc-by-nc-sa-4.0

    layoutlmv3-base

    by microsoft

    Pre-trained multimodal transformer for document AI

    1.0Mdl/month
    474likes
    125Mparams
    Identifiers
    Model ID
    microsoft/layoutlmv3-base
    Feature URI
    mixpeek://document_extractor@v1/microsoft_layoutlmv3_v1

    Overview

    LayoutLMv3 is a pre-trained multimodal transformer that jointly models text, layout (bounding boxes), and image information for document understanding. It achieves state-of-the-art on form understanding, receipt extraction, and document classification.

    On Mixpeek, LayoutLMv3 extracts document structure, identifying headings, paragraphs, tables, and their spatial relationships for structured retrieval.

    Architecture

    Unified multimodal transformer that takes text tokens, spatial layout coordinates, and image patches as input. Pre-trained with Masked Language Modeling, Masked Image Modeling, and Word-Patch Alignment objectives.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    
    await mx.collections.ingest({
      collection_id: "my-collection",
      source: { url: "https://example.com/invoice.pdf" },
      feature_extractors: [{
        name: "document_structure",
        version: "v1",
        params: {
          model_id: "microsoft/layoutlmv3-base"
        }
      }]
    });

    Capabilities

    • Document layout understanding
    • Form and receipt key-value extraction
    • Document classification
    • Named entity recognition on documents

    Use Cases on Mixpeek

    Intelligent document processing, extract fields from forms
    Financial document analysis, parse invoices and statements
    Legal document structure extraction

    Specification

    FrameworkHF
    Organizationmicrosoft
    FeatureDocument Structure
    Outputstructure tokens
    Modalitiesdocument
    RetrieverSection Filter
    Parameters125M
    Licensecc-by-nc-sa-4.0
    Downloads/mo1.0M
    Likes474

    Research Paper

    LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

    arxiv.org

    Build a pipeline with layoutlmv3-base

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Pipeline Builder