from mixpeek import Mixpeek

client = Mixpeek(api_key="YOUR_API_KEY")

namespace = client.namespaces.create(name="pdf-data")
collection = client.collections.create(
    namespace_id=namespace.id,
    name="invoices",
    extractors=["pdf-extraction", "table-extraction", "ocr"]
)

# Upload PDFs
client.buckets.upload(
    collection_id=collection.id,
    url="s3://your-bucket/invoices/"
)

# Search extracted data
results = client.documents.search(
    namespace_id=namespace.id,
    query="invoices over $10,000 from Q4"
)

Feature Extractors

PDF Text Extraction

Extract structured text and layout information from PDFs

645K runs

PDF Table Extraction

Convert tables in PDFs to structured data formats

482K runs

Retriever Stages

Use Cases Using This Recipe

Advanced

8 min

SNF Documentation Intelligence

Automate MDS assessments and clinical documentation for skilled nursing facilities

40% less time on charting

Documentation time reduction

healthcare

Who It's For

SNF operators, MDS coordinators, directors of nursing, and post-acute care organizations managing clinical documentation across skilled nursing facilities

View Details

Intermediate

Insurance Claims Document Processing

Extract structured data from claims documents, photos, and correspondence automatically

70% reduction in manual document handling

Adjuster data entry time

insurance

Who It's For

Insurance carriers, claims adjusters, and third-party administrators processing 1,000+ claims monthly across property, casualty, auto, and health lines

View Details

Intermediate

Enterprise RAG Search

Ask questions across all your enterprise data and get sourced, verifiable answers

80% faster from question to answer

Information retrieval time

finance

Who It's For

Financial services firms, consulting organizations, legal teams, and enterprise knowledge workers who need to synthesize information across thousands of internal documents, reports, and presentations

View Details

Related Recipes & Resources

Explore these related resources to deepen your understanding and discover more powerful features

Extractor

PDF Text Extraction

Extract structured text and layout information from PDFs

Learn more

Extractor

PDF Table Extraction

Convert tables in PDFs to structured data formats

Learn more

Glossary

OCR

Optical Character Recognition

Learn more

Recipe

Document Intelligence Search

Extract and search through PDFs, presentations, and documents. Combines OCR, layout analysis, and semantic search for comprehensive document retrieval.

Learn more

Recipe

Document RAG Pipeline

Retrieval-augmented generation for document collections. Extracts text, tables, and figures from PDFs using OCR and layout analysis, then retrieves relevant page sections to answer natural language questions with precise page and section citations.

Learn more

Recipe

Document Classification Pipeline

Classify documents into custom business categories using layout-aware extraction and taxonomy enrichment. Handles invoices, contracts, reports, forms, and correspondence by analyzing both textual content and visual document structure.

Learn more