Document Intelligence Search
Extract and search through PDFs, presentations, and documents. Combines OCR, layout analysis, and semantic search for comprehensive document retrieval.
from mixpeek import Mixpeekclient = Mixpeek(api_key="YOUR_API_KEY")namespace = client.namespaces.create(name="doc-search")collection = client.collections.create(namespace_id=namespace.id,name="contracts",extractors=["pdf-extraction", "text-embedding-v2", "ocr"],chunk_strategy="page-based")# Upload documentsclient.buckets.upload(collection_id=collection.id,url="s3://your-bucket/contracts/")# Search with high BM25 weight for exact legal termsresults = client.retrievers.execute(retriever_id=retriever.id,query="indemnification clause with liability cap")
Feature Extractors
PDF Text Extraction
Extract structured text and layout information from PDFs
Retriever Stages
rerank
Rerank documents using cross-encoder models for accurate relevance
