Query Transformation Pipelines: Turning a Raw Agent Question Into Searchable Intent

The Failure Nobody Looks At

When retrieval returns the wrong evidence, the reflex is to blame the index, the embedding model, or the reranker. But a large share of misses happen earlier, in the gap between the words the agent produced and the words the corpus was indexed with. The query "did the speaker walk back the Q3 guidance" will not land near a transcript chunk that says "we are revising our full-year outlook downward," even though they mean the same thing. No amount of reranking saves a first stage that never surfaced the right candidate.

This gap is called the vocabulary mismatch problem, and it predates embeddings: lexical search has fought it for decades and dense search only narrows it. The query side of the pipeline, everything that happens between the raw question and the vector or term that actually hits the index, is the cheapest place to fix retrieval, because it runs once per query instead of once per document. That stage is query transformation, and for an agent it is the difference between a question and a search.

For an agent reading unstructured content the problem is sharper than in plain text RAG, for three reasons:

The query is often a reasoning trace, not a clean question, so the actual search need is buried in accumulated context.

The corpus is multimodal, so the same query might need to hit a transcript index, a visual-embedding index, or an OCR index, and the agent has to decide which.

The query is frequently compound: "find the demo where the bottle is shown and the narrator mentions free returns" is two conditions over two modalities that no single vector matches well.

A query transformation pipeline is a small sequence of steps that takes the raw input and produces one or more clean, routed, retrieval-ready queries. Here is the anatomy.

Step 1: Classify the Query

Before transforming anything, decide what kind of query this is, because the right transform depends on it. A lightweight classifier, a small LLM call or even a few heuristics, sorts the query into a handful of types:

Lookup ("what is the promo code in this ad") wants exact-match and high precision; favor lexical or filtered search, transform little.

Semantic ("ads with a calm, premium feel") wants dense search and benefits from expansion.

Compound ("blue bottle while the narrator says subscribe") needs decomposition into sub-queries.

Navigational or filter-shaped ("clips from campaign X after May") should become a metadata filter, not a vector search at all.

The point of classification is to avoid applying an expensive transform where it hurts. Running HyDE on a navigational query that just needs a filter wastes a model call and adds noise. Routing is the first transform, and the cheapest one.

Step 2: Extract Intent From the Trace

When the input is an agent reasoning trace, the search need is a small fraction of the tokens. Embedding the whole trace dilutes it: the dense vector is dominated by accumulated context, and lexical search drowns in stopwords. Extract the current search intent first, then transform only that.

raw trace:  "I already confirmed the index is HNSW and the data is 10M
             vectors. Now I need to know the memory footprint of that
             configuration so I can size the node."
intent:     "HNSW index memory footprint at 10M vectors"

A 1B-parameter model is plenty for this, and many agent frameworks now emit a structured "current search intent" field alongside the trace so you skip the call entirely. Everything downstream operates on the intent, not the raw trace.

Step 3: Expand or Rewrite

Once you have a clean intent, you can close the vocabulary gap. Two families of techniques, with an important tradeoff between them.

Pseudo-relevance feedback (PRF) is the classic, model-free option. Run the query, take the top few results, harvest the terms or vectors that appear in them, and add them back into a second, expanded query. The assumption is that the first-pass top results are roughly relevant, so their vocabulary is the corpus's way of saying what the query meant. PRF is cheap and grounded in the actual corpus, but it has a known failure mode called query drift: if the first pass returns off-topic results, the expansion amplifies the error.

HyDE (Hypothetical Document Embeddings) flips the direction. Instead of embedding the question, you have a small model write a hypothetical answer to it, then embed that. The intuition is that answers live nearer to real documents in embedding space than questions do, so a fabricated answer is a better probe than the literal query.

query:        "did the speaker walk back the Q3 guidance"
HyDE answer:  "During the call, management revised the full-year
               outlook downward and lowered Q3 revenue guidance,
               citing softer demand."
embed THIS, not the question, and search with it.

The factual accuracy of the hypothetical answer does not matter; only its shape and vocabulary do, because you discard it after embedding. HyDE shines on semantic queries with a clear answer vocabulary and struggles on queries the small model knows nothing about (it can hallucinate a misleading probe). PRF needs no model but trusts the first pass; HyDE needs a model but does not. Many systems run both and fuse, which is the next step.

Step 4: Fan Out and Fuse With Reciprocal Rank Fusion

A single phrasing of a query is one sample of a noisy distribution. Generate several paraphrases, or several decomposed sub-queries, run them in parallel, and merge the ranked lists. This multi-query fan-out is robust precisely because the variations disagree: an item that ranks well across several phrasings is more likely truly relevant than one that spikes on a single lucky phrasing.

The merge step needs a method that does not depend on each list's raw scores being comparable, because a lexical list and a dense list produce scores on totally different scales. Reciprocal Rank Fusion (RRF) solves this by throwing away the scores and using only the rank position:

rrf_score(d) = sum over each result list L of
                 1 / (k + rank of d in L)

k is a small constant, commonly 60, that dampens the
influence of the very top ranks.

Each list contributes a vote inversely proportional to where the document appeared in it. A document ranked first in three of five lists beats one ranked first in one list and absent from the rest. RRF is parameter-light, scale-free, and is exactly the same machinery that fuses a dense list with a lexical (BM25) list in hybrid search, which is why it doubles as the merge step for multi-query fan-out and for cross-modality fusion. The same fusion that combines paraphrases combines modalities.

Step 5: Route to the Right Modality

For a multimodal corpus, the most consequential transform is deciding which index the query should hit. A query about spoken claims belongs on the transcript index; a query about visual style belongs on the image-embedding index; a query about a price on screen belongs on the OCR index. Sending every query to every index is wasteful and noisy; sending it to the wrong one returns confident garbage.

Modality routing can be a classifier ("this query is about audio content") or, for compound queries, a decomposition that sends each sub-query to its natural modality and then fuses with RRF:

compound query: "the demo where the bottle is shown and the
                 narrator says free returns"

sub-query A -> visual index   : "bottle shown in demo"
sub-query B -> transcript index: "narrator says free returns"

fuse the two ranked lists with RRF; the clip that ranks
well on BOTH rises to the top.

This is the agent-perception version of query transformation: the transform does not just clean the words, it picks the sense organ. A clip that satisfies both conditions appears in both lists and wins the fusion; a clip that only shows the bottle, or only mentions returns, ranks lower because it only earns one set of votes.

Choosing Transforms

Query type

Best transform

Why

Navigational / filter	Route to metadata filter, no vector search	Exact constraints, vector search only adds noise
Lookup / exact	Light lexical, minimal rewrite	Precision matters, expansion hurts
Semantic	HyDE or expansion	Vocabulary gap is the bottleneck
Compound / multimodal	Decompose, route per sub-query, RRF	One vector cannot satisfy several conditions
Reasoning trace	Intent extraction first, then the above	Search need is buried in context

Two rules cut across all of them. First, transform conditionally, not always: a pipeline that runs HyDE and five-way fan-out on every query burns latency and money on queries that needed a filter. Classify first, then spend. Second, a transform is config that changes what the agent sees, so version it and measure it like an index change, not a prompt tweak.

Evaluating the Transform Itself

The trap is to evaluate only end-to-end answer quality, which hides where the win or loss came from. Isolate the query stage and measure it directly:

Recall at the candidate stage, before any reranking. Query transformation's whole job is to get the right item into the candidate pool; if recall did not move, the transform did nothing, no matter what the final answer looks like.

Per-query-type breakdown. A transform that helps semantic queries often hurts lookups. Average metrics hide this; slice by the classifier's labels.

Latency and cost added. HyDE adds a model call, fan-out multiplies retrievals. Track the p95 and the spend, because the durable failure mode of query transformation is paying for transforms that did not earn their keep.

Drift on PRF. Specifically check that expansion did not pull results off topic on hard queries; compare expanded recall against unexpanded on a held-out set.

Walk each transform from off to on and keep only the ones where candidate recall improves more than latency degrades for the query types they target.

Doing This in Mixpeek

In Mixpeek the query transformation lives in front of the retriever stages. You route by classifying the query, optionally rewrite or fan it out, and let the retriever fuse the results, including a hybrid dense-plus-lexical (BM25) first stage whose lists are merged with RRF, the same fusion that merges your paraphrases or your per-modality sub-queries.

from mixpeek import Mixpeek

client = Mixpeek(api_key="mxp_sk_...")

# A retriever whose first stage already fuses dense + lexical with RRF.
retriever = client.retrievers.create(
    namespace_id="ns_video",
    collection_ids=["col_transcripts", "col_keyframes"],
    retriever_name="agent-search-fused",
    stages=[
        # Hybrid first stage: dense + BM25, merged by reciprocal rank fusion.
        {
            "stage_name": "hybrid_search",
            "parameters": {"fusion": "rrf", "rrf_k": 60, "top_k": 200},
        },
        {"stage_name": "rerank", "parameters": {"top_k": 20}},
    ],
)

# Application-side query transformation: classify, then route + fan out.
def transform_and_search(raw_query: str):
    qtype = classify(raw_query)                 # your small classifier
    if qtype == "navigational":
        # Skip vector search; push constraints into a filter instead.
        return client.retrievers.execute(
            retriever_id=retriever.retriever_id,
            inputs={"filters": to_filter(raw_query)},
            top_k=20,
        )
    # Semantic / compound: fan out into paraphrases or sub-queries,
    # run each, and let the retriever's RRF do the merge.
    queries = expand_or_decompose(raw_query)    # HyDE, paraphrase, or split
    return client.retrievers.execute(
        retriever_id=retriever.retriever_id,
        inputs={"text": queries},               # multiple probes, one fused result
        top_k=20,
    )

Treat the classifier, the expansion strategy, and the fan-out width as versioned config, because each one changes the candidate pool an agent reasons over. Measure candidate-stage recall per query type before and after a change, and pick the embedding for the dense half of the hybrid stage, on the Models page, that matches the modality each sub-query is routed to.

The Failure Nobody Looks At

Step 1: Classify the Query

Step 2: Extract Intent From the Trace

Step 3: Expand or Rewrite

Step 4: Fan Out and Fuse With Reciprocal Rank Fusion

Step 5: Route to the Right Modality

Choosing Transforms

Evaluating the Transform Itself

Doing This in Mixpeek

Further Reading

Put multimodal search to work

Already have vectors?

Run this on your own data

Related guides

Hybrid Search Fusion: How to Combine Dense and Lexical Retrieval Without Breaking Ranking

Semantic Caching: How Agents Skip Work They Have Already Done

BM25 and the Inverted Index: The Lexical Retriever Every Hybrid Search Treats as a Black Box