PDFJSONConverter
Convert PDF documents into clean, structured JSON output. Extracts text, tables, form fields, metadata, and document structure into a machine-readable JSON format suitable for API ingestion, database storage, and programmatic processing.
How It Works
Upload a PDF file or provide a URL to the Mixpeek API.
The document is classified as digital-native or scanned, with OCR applied as needed.
Layout analysis segments the document into pages, paragraphs, tables, and form fields.
An LLM maps extracted content to your target JSON schema or a default document schema.
Structured JSON is returned with pages, content blocks, tables, and metadata.
Code Examples
from mixpeek import Mixpeekclient = Mixpeek(api_key="YOUR_API_KEY")result = client.convert(source="https://example.com/annual-report.pdf",from_format="pdf",to_format="json",options={"ocr_fallback": True,"extract_tables": True,"extract_images": False,"pages": "1-20"})for page in result.pages:print(f"--- Page {page.number} ({len(page.blocks)} blocks) ---")for block in page.blocks:print(f" [{block.type}] {block.text[:100]}...")
Use Cases
Supported Input Formats
Quick Info
Try This Conversion
Get started with the Mixpeek API and convert your first file in minutes.
Frequently Asked Questions
Related Converters
PDF to Text
Extract clean, structured text from PDF documents including scanned pages, multi-column layouts, headers/footers, and tables. Combines traditional parsing with OCR and layout analysis for maximum accuracy.
PDF to Structured Data
Extract structured key-value pairs, tables, and form fields from PDF documents. Uses layout analysis and LLM extraction to produce clean JSON output, even from complex forms and invoices.
PDF to Markdown
Convert PDF documents to clean Markdown format, preserving headings, lists, tables, links, and emphasis. Ideal for migrating content into wikis, CMS platforms, and documentation systems.
PDF to Table Data
Extract tables from PDF documents and convert them into structured formats like JSON arrays, CSV, or Excel. Handles complex table layouts with merged cells, nested headers, multi-page tables, and borderless tables using AI-powered layout detection.
Ready to convert pdf to json?
Start using the Mixpeek PDF to JSON in minutes. Sign up for a free API key and follow the documentation to get started.
