Mixpeek Logo
    Models/Document Analysis/microsoft/table-transformer-detection
    HFTable Extractionmit

    table-transformer-detection

    by microsoft

    Detect and extract tables from document images

    3.3Mdl/month
    400likes
    29Mparams
    Identifiers
    Model ID
    microsoft/table-transformer-detection
    Feature URI
    mixpeek://document_extractor@v1/microsoft_table_transformer_v1

    Overview

    Table Transformer uses DETR architecture adapted for table detection and structure recognition in document images. It identifies table regions and their internal structure (rows, columns, headers).

    On Mixpeek, Table Transformer extracts structured table data from PDFs and scanned documents, enabling queries over tabular content in your document collections.

    Architecture

    DETR-based architecture with ResNet-18 backbone, fine-tuned on PubTables-1M dataset. Separate models for table detection (locating tables) and table structure recognition (parsing rows/columns).

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    
    await mx.collections.ingest({
      collection_id: "my-collection",
      source: { url: "https://example.com/report.pdf" },
      feature_extractors: [{
        name: "table_extraction",
        version: "v1",
        params: {
          model_id: "microsoft/table-transformer-detection"
        }
      }]
    });

    Capabilities

    • Table region detection in document images
    • Table structure recognition (rows, columns, headers)
    • Handles complex table layouts
    • Works with scanned and digital documents

    Use Cases on Mixpeek

    Financial report analysis — extract data tables
    Scientific paper data extraction
    Government document processing — parse structured data

    Specification

    FrameworkHF
    Organizationmicrosoft
    FeatureTable Extraction
    Outputtable JSON
    Modalitiesdocument
    RetrieverTable Filter
    Parameters29M
    Licensemit
    Downloads/mo3.3M
    Likes400

    Research Paper

    PubTables-1M: Towards comprehensive table extraction from unstructured documents

    arxiv.org

    Build a pipeline with table-transformer-detection

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Pipeline Builder