Mixpeek Logo
    Enhanced

    Legal Document RAG Pipeline

    Purpose-built RAG pipeline for legal documents. High-precision retrieval with strong keyword matching for legal terminology, citations, and clause references.

    text
    Production
    1.7K runs
    Deploy Recipe
    from mixpeek import Mixpeek
    client = Mixpeek(api_key="YOUR_API_KEY")
    namespace = client.namespaces.create(name="legal-docs")
    collection = client.collections.create(
    namespace_id=namespace.id,
    name="contracts",
    extractors=["pdf-extraction", "text-embedding-v2"],
    chunk_strategy="paragraph"
    )
    # High BM25 weight for exact legal term matching
    retriever = client.retrievers.create(
    namespace_id=namespace.id,
    name="legal-search",
    stages=[
    {"type": "hybrid_search", "vector_weight": 0.4, "bm25_weight": 0.6, "top_k": 50},
    {"type": "rerank", "model": "colbert-v2", "top_k": 10}
    ]
    )
    results = client.retrievers.execute(
    retriever_id=retriever.id,
    query="Section 14.2 indemnification obligations"
    )

    Feature Extractors

    PDF Text Extraction

    Extract structured text and layout information from PDFs

    645K runs

    Retriever Stages

    rerank

    Rerank documents using cross-encoder models for accurate relevance

    sort