Mixpeek Logo
    Similar

    Document Intelligence Search

    Extract and search through PDFs, presentations, and documents. Combines OCR, layout analysis, and semantic search for comprehensive document retrieval.

    text
    image
    Multi-Tier
    3.2K runs
    Deploy Recipe
    from mixpeek import Mixpeek
    client = Mixpeek(api_key="YOUR_API_KEY")
    namespace = client.namespaces.create(name="doc-search")
    collection = client.collections.create(
    namespace_id=namespace.id,
    name="contracts",
    extractors=["pdf-extraction", "text-embedding-v2", "ocr"],
    chunk_strategy="page-based"
    )
    # Upload documents
    client.buckets.upload(
    collection_id=collection.id,
    url="s3://your-bucket/contracts/"
    )
    # Search with high BM25 weight for exact legal terms
    results = client.retrievers.execute(
    retriever_id=retriever.id,
    query="indemnification clause with liability cap"
    )

    Feature Extractors

    PDF Text Extraction

    Extract structured text and layout information from PDFs

    645K runs

    Retriever Stages

    rerank

    Rerank documents using cross-encoder models for accurate relevance

    sort