table-transformer-detection

by microsoft

Detect and extract tables from document images

1.7Mdl/month

424likes

29Mparams

HuggingFace Use in Pipeline

Identifiers

Model ID

microsoft/table-transformer-detection

Feature URI

mixpeek://document_extractor@v1/microsoft_table_transformer_v1

Overview

Table Transformer uses DETR architecture adapted for table detection and structure recognition in document images. It identifies table regions and their internal structure (rows, columns, headers).

On Mixpeek, Table Transformer extracts structured table data from PDFs and scanned documents, enabling queries over tabular content in your document collections.

Architecture

DETR-based architecture with ResNet-18 backbone, fine-tuned on PubTables-1M dataset. Separate models for table detection (locating tables) and table structure recognition (parsing rows/columns).

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
  namespace_id: "my-namespace",
  collection_name: "my-collection",
  source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
  feature_extractor: {
    feature_extractor_name: "table_extraction",
    version: "v1",
    parameters: { model_id: "microsoft/table-transformer-detection" },
  },
});