Legal Document RAG Pipeline
Purpose-built RAG pipeline for legal documents. High-precision retrieval with strong keyword matching for legal terminology, citations, and clause references.
from mixpeek import Mixpeekclient = Mixpeek(api_key="YOUR_API_KEY")namespace = client.namespaces.create(name="legal-docs")collection = client.collections.create(namespace_id=namespace.id,name="contracts",extractors=["pdf-extraction", "text-embedding-v2"],chunk_strategy="paragraph")# High BM25 weight for exact legal term matchingretriever = client.retrievers.create(namespace_id=namespace.id,name="legal-search",stages=[{"type": "hybrid_search", "vector_weight": 0.4, "bm25_weight": 0.6, "top_k": 50},{"type": "rerank", "model": "colbert-v2", "top_k": 10}])results = client.retrievers.execute(retriever_id=retriever.id,query="Section 14.2 indemnification obligations")
Feature Extractors
PDF Text Extraction
Extract structured text and layout information from PDFs
Retriever Stages
rerank
Rerank documents using cross-encoder models for accurate relevance
