NEWManaged multimodal retrieval.Explore platform →
    Similar

    Hybrid BM25 + Dense Vector Search

    Use MVS hybrid search to combine BM25 keyword matching with dense vector similarity. Get the precision of exact keyword matches and the recall of semantic understanding in a single query.

    text
    Single Tier
    18.7K runs
    Run in Builder

    "FastAPI Pydantic v2 validation patterns"

    Why This Matters

    Pure vector search misses exact keyword matches. Pure keyword search misses semantic meaning. Hybrid search gives you both -- critical for technical content, product catalogs, and any domain with specific terminology.

    from openai import OpenAI
    from mixpeek import Mixpeek
    openai = OpenAI(api_key="your-openai-key")
    mvs = Mixpeek(api_key="your-mvs-key")
    NAMESPACE = "my-namespace"
    def embed(text: str) -> list[float]:
    resp = openai.embeddings.create(model="text-embedding-3-small", input=text)
    return resp.data[0].embedding
    # Upsert documents with BOTH dense embeddings and text content
    documents = [
    {"text": "FastAPI uses Pydantic v2 for data validation and serialization", "topic": "python"},
    {"text": "Express.js middleware handles request/response transformations", "topic": "node"},
    {"text": "FastAPI supports async/await natively with Starlette ASGI", "topic": "python"},
    {"text": "Django ORM provides database abstraction with QuerySet API", "topic": "python"},
    ]
    for doc in documents:
    mvs.namespaces.documents.upsert(
    namespace=NAMESPACE,
    documents=[{
    "dense_embedding": embed(doc["text"]),
    "content": doc["text"], # BM25 indexes this field
    "metadata": {"topic": doc["topic"]}
    }]
    )
    # Hybrid search: BM25 keyword matching + dense vector similarity
    query_text = "FastAPI Pydantic validation"
    results = mvs.namespaces.documents.search(
    namespace=NAMESPACE,
    query={
    "dense_embedding": embed(query_text),
    "text": query_text # BM25 component
    },
    hybrid={
    "enabled": True,
    "alpha": 0.6 # 0.0 = pure BM25, 1.0 = pure dense, 0.6 = balanced
    },
    top_k=5
    )
    for doc in results:
    print(f"{doc['score']:.3f} | {doc['metadata'].get('topic', '')} | {doc['content'][:80]}")

    Feature Extractors

    Retriever Stages

    limit

    Truncate results to a maximum count with optional offset for pagination

    reduce