PDFTable DataConverter
Extract tables from PDF documents and convert them into structured formats like JSON arrays, CSV, or Excel. Handles complex table layouts with merged cells, nested headers, multi-page tables, and borderless tables using AI-powered layout detection.
How It Works
Upload a PDF file or provide a URL to the Mixpeek API.
AI-powered layout analysis detects all table regions on each page.
Cell boundaries are identified using a combination of rule detection and machine learning.
Merged cells, nested headers, and multi-page continuation tables are resolved into clean row/column structures.
Tables are returned as structured arrays with headers, rows, and optional type inference per column.
Code Examples
from mixpeek import Mixpeekclient = Mixpeek(api_key="YOUR_API_KEY")result = client.convert(source="https://example.com/financial-report.pdf",from_format="pdf",to_format="table-data",options={"stitch_multipage_tables": True,"table_output_format": "json","include_headers": True,"pages": "5-15"})for table in result.tables:print(f"Table on page {table.page}: {table.num_rows} rows x {table.num_cols} cols")for row in table.rows[:3]:print(f" {row}")
Use Cases
Supported Input Formats
Quick Info
Try This Conversion
Get started with the Mixpeek API and convert your first file in minutes.
Frequently Asked Questions
Related Converters
PDF to Text
Extract clean, structured text from PDF documents including scanned pages, multi-column layouts, headers/footers, and tables. Combines traditional parsing with OCR and layout analysis for maximum accuracy.
PDF to Structured Data
Extract structured key-value pairs, tables, and form fields from PDF documents. Uses layout analysis and LLM extraction to produce clean JSON output, even from complex forms and invoices.
CSV to Embeddings
Convert CSV files into vector embeddings by selecting and combining columns into text representations. Supports header mapping, custom delimiters, and batch processing for large datasets.
PDF to JSON
Convert PDF documents into clean, structured JSON output. Extracts text, tables, form fields, metadata, and document structure into a machine-readable JSON format suitable for API ingestion, database storage, and programmatic processing.
Ready to convert pdf to table data?
Start using the Mixpeek PDF to Table Data in minutes. Sign up for a free API key and follow the documentation to get started.
