Enhanced

Legal Document RAG Pipeline

Purpose-built RAG pipeline for legal documents. High-precision retrieval with strong keyword matching for legal terminology, citations, and clause references.

text

Production

1.7K runs

Deploy Recipe

from mixpeek import Mixpeek

client = Mixpeek(api_key="YOUR_API_KEY")

namespace = client.namespaces.create(name="legal-docs")
collection = client.collections.create(
    namespace_id=namespace.id,
    name="contracts",
    extractors=["pdf-extraction", "text-embedding-v2"],
    chunk_strategy="paragraph"
)

# High BM25 weight for exact legal term matching
retriever = client.retrievers.create(
    namespace_id=namespace.id,
    name="legal-search",
    stages=[
        {"type": "hybrid_search", "vector_weight": 0.4, "bm25_weight": 0.6, "top_k": 50},
        {"type": "rerank", "model": "colbert-v2", "top_k": 10}
    ]
)

results = client.retrievers.execute(
    retriever_id=retriever.id,
    query="Section 14.2 indemnification obligations"
)

Feature Extractors

PDF Text Extraction

Extract structured text and layout information from PDFs

645K runs

Retriever Stages

rerank

Rerank documents using cross-encoder models for accurate relevance

sort

Legal Document RAG Pipeline

Feature Extractors

Retriever Stages

Related Recipes & Resources

PDF Text Extraction

Web Scraper

Text Embedding

Named Entity Recognition

Keyword Extraction

Topic Modeling