Similar

Document Intelligence Search

Extract and search through PDFs, presentations, and documents. Combines OCR, layout analysis, and semantic search for comprehensive document retrieval.

text

image

Multi-Tier

3.2K runs

Run in Builder

from mixpeek import Mixpeek

client = Mixpeek(api_key="YOUR_API_KEY")

namespace = client.namespaces.create(name="doc-search")
collection = client.collections.create(
    namespace_id=namespace.id,
    name="contracts",
    extractors=["pdf-extraction", "text-embedding-v2", "ocr"],
    chunk_strategy="page-based"
)

# Upload documents
client.buckets.upload(
    collection_id=collection.id,
    url="s3://your-bucket/contracts/"
)

# Search with high BM25 weight for exact legal terms
results = client.retrievers.execute(
    retriever_id=retriever.id,
    query="indemnification clause with liability cap"
)

Feature Extractors

PDF Text Extraction

Extract structured text and layout information from PDFs

645K runs

Retriever Stages

rerank

Rerank documents using cross-encoder models for accurate relevance

sort

Use Cases Using This Recipe

Intermediate

Insurance Claims Document Processing

Extract structured data from claims documents, photos, and correspondence automatically

70% reduction in manual document handling

Adjuster data entry time

insurance

Who It's For

Insurance carriers, claims adjusters, and third-party administrators processing 1,000+ claims monthly across property, casualty, auto, and health lines

View Details

Beginner

Semantic Search for Knowledge Bases

Find answers by meaning, not keywords, across your entire knowledge repository

85% of queries answered on first search vs. 40% baseline

First-search success rate

education

Who It's For

Knowledge management teams, internal documentation owners, customer support organizations, and EdTech platforms maintaining 10K+ articles, documents, and multimedia resources

View Details

Intermediate

Enterprise RAG Search

Ask questions across all your enterprise data and get sourced, verifiable answers

80% faster from question to answer

Information retrieval time

finance

Who It's For

Financial services firms, consulting organizations, legal teams, and enterprise knowledge workers who need to synthesize information across thousands of internal documents, reports, and presentations

View Details

Advanced

12 min

Clinical NLP at Scale

Extract structured intelligence from clinical notes, pathology reports, and medical records

94% F1 on medical NER benchmarks

Entity extraction accuracy

healthcare

Who It's For

Healthcare IT teams, clinical informatics departments, and health systems processing thousands of clinical documents daily

View Details

Document Intelligence Search

Feature Extractors

Retriever Stages

Use Cases Using This Recipe

Insurance Claims Document Processing

Semantic Search for Knowledge Bases

Enterprise RAG Search

Clinical NLP at Scale

Related Recipes & Resources

BYO Embeddings Vector Search

PDF Text Extraction

OCR

Multimodal Hybrid Search Pipeline

Clinical Documentation Structuring

Hybrid BM25 + Dense Vector Search