NEWVectors or files. Pick a path.Start →
    Retrieval
    17 min read
    Updated 2026-06-19

    Query Transformation Pipelines: Turning a Raw Agent Question Into Searchable Intent

    Most retrieval failures happen before the index is ever touched, in the gap between what an agent typed and what the corpus indexed. This guide teaches the front-door algorithms that close that gap: query classification and routing, multi-query fan-out with reciprocal rank fusion, HyDE versus pseudo-relevance feedback, modality routing for multimodal corpora, and how to evaluate the transform itself, then wires the pattern into a Mixpeek retriever.

    Query Transformation
    Query Understanding
    HyDE
    Reciprocal Rank Fusion
    Modality Routing
    Retrieval

    The Failure Nobody Looks At



    When retrieval returns the wrong evidence, the reflex is to blame the index, the embedding model, or the reranker. But a large share of misses happen earlier, in the gap between the words the agent produced and the words the corpus was indexed with. The query "did the speaker walk back the Q3 guidance" will not land near a transcript chunk that says "we are revising our full-year outlook downward," even though they mean the same thing. No amount of reranking saves a first stage that never surfaced the right candidate.

    This gap is called the vocabulary mismatch problem, and it predates embeddings: lexical search has fought it for decades and dense search only narrows it. The query side of the pipeline, everything that happens between the raw question and the vector or term that actually hits the index, is the cheapest place to fix retrieval, because it runs once per query instead of once per document. That stage is query transformation, and for an agent it is the difference between a question and a search.

    For an agent reading unstructured content the problem is sharper than in plain text RAG, for three reasons:

  1. The query is often a reasoning trace, not a clean question, so the actual search need is buried in accumulated context.
  2. The corpus is multimodal, so the same query might need to hit a transcript index, a visual-embedding index, or an OCR index, and the agent has to decide which.
  3. The query is frequently compound: "find the demo where the bottle is shown and the narrator mentions free returns" is two conditions over two modalities that no single vector matches well.


  4. A query transformation pipeline is a small sequence of steps that takes the raw input and produces one or more clean, routed, retrieval-ready queries. Here is the anatomy.

    Step 1: Classify the Query



    Before transforming anything, decide what kind of query this is, because the right transform depends on it. A lightweight classifier, a small LLM call or even a few heuristics, sorts the query into a handful of types:

  5. Lookup ("what is the promo code in this ad") wants exact-match and high precision; favor lexical or filtered search, transform little.
  6. Semantic ("ads with a calm, premium feel") wants dense search and benefits from expansion.
  7. Compound ("blue bottle while the narrator says subscribe") needs decomposition into sub-queries.
  8. Navigational or filter-shaped ("clips from campaign X after May") should become a metadata filter, not a vector search at all.


  9. The point of classification is to avoid applying an expensive transform where it hurts. Running HyDE on a navigational query that just needs a filter wastes a model call and adds noise. Routing is the first transform, and the cheapest one.

    Step 2: Extract Intent From the Trace



    When the input is an agent reasoning trace, the search need is a small fraction of the tokens. Embedding the whole trace dilutes it: the dense vector is dominated by accumulated context, and lexical search drowns in stopwords. Extract the current search intent first, then transform only that.

    raw trace:  "I already confirmed the index is HNSW and the data is 10M
                 vectors. Now I need to know the memory footprint of that
                 configuration so I can size the node."
    intent:     "HNSW index memory footprint at 10M vectors"
    


    A 1B-parameter model is plenty for this, and many agent frameworks now emit a structured "current search intent" field alongside the trace so you skip the call entirely. Everything downstream operates on the intent, not the raw trace.

    Step 3: Expand or Rewrite



    Once you have a clean intent, you can close the vocabulary gap. Two families of techniques, with an important tradeoff between them.

    Pseudo-relevance feedback (PRF) is the classic, model-free option. Run the query, take the top few results, harvest the terms or vectors that appear in them, and add them back into a second, expanded query. The assumption is that the first-pass top results are roughly relevant, so their vocabulary is the corpus's way of saying what the query meant. PRF is cheap and grounded in the actual corpus, but it has a known failure mode called query drift: if the first pass returns off-topic results, the expansion amplifies the error.

    HyDE (Hypothetical Document Embeddings) flips the direction. Instead of embedding the question, you have a small model write a hypothetical answer to it, then embed that. The intuition is that answers live nearer to real documents in embedding space than questions do, so a fabricated answer is a better probe than the literal query.

    query:        "did the speaker walk back the Q3 guidance"
    HyDE answer:  "During the call, management revised the full-year
                   outlook downward and lowered Q3 revenue guidance,
                   citing softer demand."
    embed THIS, not the question, and search with it.
    


    The factual accuracy of the hypothetical answer does not matter; only its shape and vocabulary do, because you discard it after embedding. HyDE shines on semantic queries with a clear answer vocabulary and struggles on queries the small model knows nothing about (it can hallucinate a misleading probe). PRF needs no model but trusts the first pass; HyDE needs a model but does not. Many systems run both and fuse, which is the next step.

    Step 4: Fan Out and Fuse With Reciprocal Rank Fusion



    A single phrasing of a query is one sample of a noisy distribution. Generate several paraphrases, or several decomposed sub-queries, run them in parallel, and merge the ranked lists. This multi-query fan-out is robust precisely because the variations disagree: an item that ranks well across several phrasings is more likely truly relevant than one that spikes on a single lucky phrasing.

    The merge step needs a method that does not depend on each list's raw scores being comparable, because a lexical list and a dense list produce scores on totally different scales. Reciprocal Rank Fusion (RRF) solves this by throwing away the scores and using only the rank position:

    rrf_score(d) = sum over each result list L of
                     1 / (k + rank of d in L)

    k is a small constant, commonly 60, that dampens the influence of the very top ranks.


    Each list contributes a vote inversely proportional to where the document appeared in it. A document ranked first in three of five lists beats one ranked first in one list and absent from the rest. RRF is parameter-light, scale-free, and is exactly the same machinery that fuses a dense list with a lexical (BM25) list in hybrid search, which is why it doubles as the merge step for multi-query fan-out and for cross-modality fusion. The same fusion that combines paraphrases combines modalities.

    Step 5: Route to the Right Modality



    For a multimodal corpus, the most consequential transform is deciding which index the query should hit. A query about spoken claims belongs on the transcript index; a query about visual style belongs on the image-embedding index; a query about a price on screen belongs on the OCR index. Sending every query to every index is wasteful and noisy; sending it to the wrong one returns confident garbage.

    Modality routing can be a classifier ("this query is about audio content") or, for compound queries, a decomposition that sends each sub-query to its natural modality and then fuses with RRF:

    compound query: "the demo where the bottle is shown and the
                     narrator says free returns"

    sub-query A -> visual index : "bottle shown in demo" sub-query B -> transcript index: "narrator says free returns"

    fuse the two ranked lists with RRF; the clip that ranks well on BOTH rises to the top.


    This is the agent-perception version of query transformation: the transform does not just clean the words, it picks the sense organ. A clip that satisfies both conditions appears in both lists and wins the fusion; a clip that only shows the bottle, or only mentions returns, ranks lower because it only earns one set of votes.

    Choosing Transforms



    Query typeBest transformWhy
    Navigational / filterRoute to metadata filter, no vector searchExact constraints, vector search only adds noise
    Lookup / exactLight lexical, minimal rewritePrecision matters, expansion hurts
    SemanticHyDE or expansionVocabulary gap is the bottleneck
    Compound / multimodalDecompose, route per sub-query, RRFOne vector cannot satisfy several conditions
    Reasoning traceIntent extraction first, then the aboveSearch need is buried in context
    Two rules cut across all of them. First, transform conditionally, not always: a pipeline that runs HyDE and five-way fan-out on every query burns latency and money on queries that needed a filter. Classify first, then spend. Second, a transform is config that changes what the agent sees, so version it and measure it like an index change, not a prompt tweak.

    Evaluating the Transform Itself



    The trap is to evaluate only end-to-end answer quality, which hides where the win or loss came from. Isolate the query stage and measure it directly:

  10. Recall at the candidate stage, before any reranking. Query transformation's whole job is to get the right item into the candidate pool; if recall did not move, the transform did nothing, no matter what the final answer looks like.
  11. Per-query-type breakdown. A transform that helps semantic queries often hurts lookups. Average metrics hide this; slice by the classifier's labels.
  12. Latency and cost added. HyDE adds a model call, fan-out multiplies retrievals. Track the p95 and the spend, because the durable failure mode of query transformation is paying for transforms that did not earn their keep.
  13. Drift on PRF. Specifically check that expansion did not pull results off topic on hard queries; compare expanded recall against unexpanded on a held-out set.


  14. Walk each transform from off to on and keep only the ones where candidate recall improves more than latency degrades for the query types they target.

    Doing This in Mixpeek



    In Mixpeek the query transformation lives in front of the retriever stages. You route by classifying the query, optionally rewrite or fan it out, and let the retriever fuse the results, including a hybrid dense-plus-lexical (BM25) first stage whose lists are merged with RRF, the same fusion that merges your paraphrases or your per-modality sub-queries.

    from mixpeek import Mixpeek

    client = Mixpeek(api_key="mxp_sk_...")

    # A retriever whose first stage already fuses dense + lexical with RRF. retriever = client.retrievers.create( namespace_id="ns_video", collection_ids=["col_transcripts", "col_keyframes"], retriever_name="agent-search-fused", stages=[ # Hybrid first stage: dense + BM25, merged by reciprocal rank fusion. { "stage_name": "hybrid_search", "parameters": {"fusion": "rrf", "rrf_k": 60, "top_k": 200}, }, {"stage_name": "rerank", "parameters": {"top_k": 20}}, ], )

    # Application-side query transformation: classify, then route + fan out. def transform_and_search(raw_query: str): qtype = classify(raw_query) # your small classifier if qtype == "navigational": # Skip vector search; push constraints into a filter instead. return client.retrievers.execute( retriever_id=retriever.retriever_id, inputs={"filters": to_filter(raw_query)}, top_k=20, ) # Semantic / compound: fan out into paraphrases or sub-queries, # run each, and let the retriever's RRF do the merge. queries = expand_or_decompose(raw_query) # HyDE, paraphrase, or split return client.retrievers.execute( retriever_id=retriever.retriever_id, inputs={"text": queries}, # multiple probes, one fused result top_k=20, )


    Treat the classifier, the expansion strategy, and the fan-out width as versioned config, because each one changes the candidate pool an agent reasons over. Measure candidate-stage recall per query type before and after a change, and pick the embedding for the dense half of the hybrid stage, on the Models page, that matches the modality each sub-query is routed to.

    Further Reading



  15. Multi-Stage Retrieval -- the staged pipeline these transformed queries feed into
  16. Agentic Retrieval -- why an agent's queries are reasoning traces, not clean questions
  17. Adaptive Indexing for Agentic Search -- the index-side complement to query-side routing
  18. Multi-Index Search Architecture -- how the per-modality indexes this guide routes to are laid out
  19. Evaluating Multimodal Retrieval -- candidate-stage recall and the metrics that isolate the query stage
  20. Managed Mixpeek

    Put multimodal search to work

    Connect a bucket and Mixpeek runs the whole multimodal search pipeline for you: extraction, indexing, and search over your own objects. No models to wire up, nothing to host.

    Start with Managed
    MVS · bring your own

    Already have vectors?

    Keep your embeddings on your own cloud and run dense, sparse, and BM25 search directly on object storage. First 1M vectors free.

    Start with MVS

    Build a Multimodal Search Pipeline

    Give agents searchable access to video, image, audio, and document evidence with Mixpeek.

    Start BuildingRead Docs

    Related guides

    Retrieval

    Hybrid Search Fusion: How to Combine Dense and Lexical Retrieval Without Breaking Ranking

    An agent searching transcripts, OCR text, and captions needs both meaning (dense vectors) and exact terms (BM25), but the two return scores on incompatible scales that you cannot simply add. This guide teaches the real fusion mechanics: why score distributions make naive normalization fail, the exact math of Reciprocal Rank Fusion and how its k parameter behaves, weighted convex combination with proper normalization, and how to choose and tune a fusion method against a labeled set.

    Read guide →
    Retrieval

    BM25 and the Inverted Index: The Lexical Retriever Every Hybrid Search Treats as a Black Box

    Every hybrid search pipeline pairs dense vectors with BM25, but almost no one can say where the BM25 number actually comes from, which is exactly why fusion, tuning, and exact-match failures stay mysterious. This guide opens the box: how an inverted index turns transcripts and OCR text into posting lists, the precise BM25 scoring formula with its term-frequency saturation and length normalization, what the k1 and b parameters really do, and why the tokenizer is the silent decider of whether an agent ever finds a serial number.

    Read guide →
    Retrieval

    Filtered Vector Search: How Agents Combine Similarity with Hard Constraints

    Almost every agentic query is a vector search plus a constraint -- 'clips from campaign X after May', 'images of red cars in the EU bucket'. This guide explains the three filtering strategies (pre-filter, post-filter, in-place predicate-aware traversal), why each one silently breaks recall or latency at different selectivities, and how a query planner picks between them.

    Read guide →