PDFJSONConverter
Extract structured key-value pairs, tables, and form fields from PDF documents. Uses layout analysis and LLM extraction to produce clean JSON output, even from complex forms and invoices.
How It Works
Upload a PDF or provide a URL.
Layout analysis identifies form fields, tables, and key-value regions.
An LLM extracts values and maps them to a structured schema.
Tables are converted to row/column JSON arrays.
The complete structured output is returned as JSON.
Code Examples
from mixpeek import Mixpeekclient = Mixpeek(api_key="YOUR_API_KEY")result = client.convert(source="https://example.com/invoice.pdf",from_format="pdf",to_format="structured-data",options={"target_schema": {"vendor_name": "string","invoice_date": "date","total_amount": "number","line_items": [{"description": "string", "amount": "number"}]}})print(result.data)
Use Cases
Supported Input Formats
Quick Info
Try This Conversion
Get started with the Mixpeek API and convert your first file in minutes.
Frequently Asked Questions
Related Converters
PDF to Text
Extract clean, structured text from PDF documents including scanned pages, multi-column layouts, headers/footers, and tables. Combines traditional parsing with OCR and layout analysis for maximum accuracy.
PDF to Embeddings
Convert PDF documents into semantic vector embeddings for search, retrieval, and RAG applications. Pages are chunked intelligently by sections and paragraphs, then embedded using text or multimodal models.
PDF to Markdown
Convert PDF documents to clean Markdown format, preserving headings, lists, tables, links, and emphasis. Ideal for migrating content into wikis, CMS platforms, and documentation systems.
HTML to Structured Data
Extract structured data from web pages using a combination of CSS/XPath selectors and LLM-based extraction. Captures product details, article metadata, contact information, and custom schemas from any website.
Ready to convert pdf to json?
Start using the Mixpeek PDF to Structured Data in minutes. Sign up for a free API key and follow the documentation to get started.
