PDFEmbeddingsConverter
Convert PDF documents into semantic vector embeddings for search, retrieval, and RAG applications. Pages are chunked intelligently by sections and paragraphs, then embedded using text or multimodal models.
How It Works
Upload a PDF or provide a URL.
Text is extracted and segmented into semantic chunks.
Diagrams and figures are optionally processed with a vision model.
Each chunk is embedded using your selected model.
Embeddings are returned with chunk text and metadata.
Code Examples
from mixpeek import Mixpeekclient = Mixpeek(api_key="YOUR_API_KEY")result = client.convert(source="https://example.com/whitepaper.pdf",from_format="pdf",to_format="embeddings",options={"model": "e5-large-instruct","chunk_size": 512,"chunk_overlap": 64,"include_figures": True})for chunk in result.chunks:print(f"Chunk {chunk.index}: {chunk.text[:80]}...")
Use Cases
Supported Input Formats
Quick Info
Try This Conversion
Get started with the Mixpeek API and convert your first file in minutes.
Frequently Asked Questions
Related Converters
PDF to Text
Extract clean, structured text from PDF documents including scanned pages, multi-column layouts, headers/footers, and tables. Combines traditional parsing with OCR and layout analysis for maximum accuracy.
PDF to Structured Data
Extract structured key-value pairs, tables, and form fields from PDF documents. Uses layout analysis and LLM extraction to produce clean JSON output, even from complex forms and invoices.
Text to Embeddings
Convert text strings, paragraphs, or documents into dense vector embeddings using state-of-the-art language models. Supports batching, chunking, and multiple model options for optimal retrieval performance.
Multimodal to Embeddings
Generate unified vector embeddings from mixed-modality inputs -- text, images, audio, and video combined. Enables cross-modal search where any modality can query any other modality in a single vector space.
Ready to convert pdf to embeddings?
Start using the Mixpeek PDF to Embeddings in minutes. Sign up for a free API key and follow the documentation to get started.
