HTMLJSONConverter
Extract structured data from web pages using a combination of CSS/XPath selectors and LLM-based extraction. Captures product details, article metadata, contact information, and custom schemas from any website.
How It Works
Provide a URL or upload an HTML file.
Existing structured data (JSON-LD, microdata, RDFa) is extracted first.
An LLM analyzes the page to extract additional structured fields.
Results are merged and validated against your target schema.
Clean JSON output is returned with confidence scores.
Code Examples
from mixpeek import Mixpeekclient = Mixpeek(api_key="YOUR_API_KEY")result = client.convert(source="https://example.com/product-page",from_format="html",to_format="structured-data",options={"target_schema": {"product_name": "string","price": "number","currency": "string","rating": "number","reviews_count": "integer"}})print(result.data)
Use Cases
Supported Input Formats
Quick Info
Try This Conversion
Get started with the Mixpeek API and convert your first file in minutes.
Frequently Asked Questions
Related Converters
PDF to Structured Data
Extract structured key-value pairs, tables, and form fields from PDF documents. Uses layout analysis and LLM extraction to produce clean JSON output, even from complex forms and invoices.
JSON to Embeddings
Convert JSON objects and arrays into semantic vector embeddings. Supports nested structures, field selection, and configurable serialization strategies for optimal embedding quality.
HTML to Text
Extract clean, readable text from HTML pages by stripping tags, scripts, and styles while preserving semantic structure. Handles navigation removal, boilerplate detection, and main content extraction.
Ready to convert html to json?
Start using the Mixpeek HTML to Structured Data in minutes. Sign up for a free API key and follow the documentation to get started.
