Similar

Hybrid BM25 + Dense Vector Search

Use MVS hybrid search to combine BM25 keyword matching with dense vector similarity. Get the precision of exact keyword matches and the recall of semantic understanding in a single query.

text

Single Tier

18.7K runs

Run in Builder

"FastAPI Pydantic v2 validation patterns"

Why This Matters

Pure vector search misses exact keyword matches. Pure keyword search misses semantic meaning. Hybrid search gives you both -- critical for technical content, product catalogs, and any domain with specific terminology.

from openai import OpenAI
from mixpeek import Mixpeek

openai = OpenAI(api_key="your-openai-key")
mvs = Mixpeek(api_key="your-mvs-key")

NAMESPACE = "my-namespace"

def embed(text: str) -> list[float]:
    resp = openai.embeddings.create(model="text-embedding-3-small", input=text)
    return resp.data[0].embedding

# Upsert documents with BOTH dense embeddings and text content
documents = [
    {"text": "FastAPI uses Pydantic v2 for data validation and serialization", "topic": "python"},
    {"text": "Express.js middleware handles request/response transformations", "topic": "node"},
    {"text": "FastAPI supports async/await natively with Starlette ASGI", "topic": "python"},
    {"text": "Django ORM provides database abstraction with QuerySet API", "topic": "python"},
]

for doc in documents:
    mvs.namespaces.documents.upsert(
        namespace=NAMESPACE,
        documents=[{
            "dense_embedding": embed(doc["text"]),
            "content": doc["text"],  # BM25 indexes this field
            "metadata": {"topic": doc["topic"]}
        }]
    )

# Hybrid search: BM25 keyword matching + dense vector similarity
query_text = "FastAPI Pydantic validation"
results = mvs.namespaces.documents.search(
    namespace=NAMESPACE,
    query={
        "dense_embedding": embed(query_text),
        "text": query_text  # BM25 component
    },
    hybrid={
        "enabled": True,
        "alpha": 0.6  # 0.0 = pure BM25, 1.0 = pure dense, 0.6 = balanced
    },
    top_k=5
)

for doc in results:
    print(f"{doc['score']:.3f} | {doc['metadata'].get('topic', '')} | {doc['content'][:80]}")

Feature Extractors

Retriever Stages

limit

Truncate results to a maximum count with optional offset for pagination

reduce

Documentation

MVS Overview Hybrid Search BM25 Configuration

Related Recipes & Resources

Explore these related resources to deepen your understanding and discover more powerful features

Recipe

Document Intelligence Search

Extract and search through PDFs, presentations, and documents. Combines OCR, layout analysis, and semantic search for comprehensive document retrieval.

Learn more

Recipe

BYO Embeddings Vector Search

Bring pre-computed embeddings from any provider (OpenAI, Cohere, Together, etc.) and upsert them directly into MVS for instant vector search. No feature extractors, no pipelines -- just embeddings in, results out.

Learn more

Recipe

RAG with MVS Standalone

Complete RAG pipeline using MVS for retrieval and OpenAI for generation. Chunk your documents, embed them with any provider, store in MVS, retrieve relevant context, and generate answers -- no managed feature extractors needed.

Learn more

Extractor

Web Scraper

Extract structured data from webpages while maintaining semantic context and relationships

Learn more

Extractor

Text Embedding

Extract semantic embeddings from documents, transcripts and text content

Learn more

Extractor

Named Entity Recognition

Identify and extract named entities like people, organizations, and locations

Learn more