Automatic optimizations
When you execute a retriever, the planner rewrites your stage list before execution. These transformations are automatic — you don’t configure them:| Optimization | What it does | Why it helps |
|---|---|---|
| Filter push-down | Moves attribute filters ahead of vector search | Shrinks the candidate set before the expensive embedding search runs |
| Stage fusion | Merges adjacent compatible stages into one | Fewer passes over the result set |
| Grouping optimization | Rewrites group/reduce stages to run database-side | Avoids materializing intermediate results |
| Computation push-down | Runs data-plane stages (feature_search, attribute_filter, sort_attribute, aggregate) inside MVS | Eliminates a network round-trip and lets the vector store filter/sort where the data lives |
| Parallel sub-queries | Runs independent operations (search + count, search + facet) concurrently | Lower wall-clock latency |
| Over-fetch hints | Fetches extra candidates when a later stage will filter them out | Preserves recall after post-filtering |
POST /v1/retrievers/{id}/execute/batch amortizes planning across all queries.
Inspect the plan with explain
POST /v1/retrievers/{retriever_id}/explain returns the optimized execution plan without running the query — per-stage cost and latency estimates, bottlenecks, and exactly which optimizations were applied. Pass hypothetical inputs to see how the plan changes with different parameters.
Example response
How to read it
| Field | Use it to… |
|---|---|
execution_plan[].estimated_input/output | See how each stage narrows the set — a stage that barely reduces the set may be unnecessary |
estimated_efficiency | Spot low-selectivity stages (close to 1.0 = passes almost everything through) |
estimated_cost_credits / estimated_duration_ms | Budget before running; find the expensive stage |
bottleneck_stages | The stages dominating latency — optimize these first |
cache_likely | Whether a stage will likely hit the cache |
optimization_details.decisions | Exactly which automatic rewrites fired (and why) |
optimization_suggestions | Concrete, actionable tuning hints |
warnings | Per-stage red flags (e.g. high-cost stage, overly broad top_k) |
The
execution_plan reflects the optimized pipeline, not your original stage list. Compare optimization_details.original_stage_count vs optimized_stage_count to see how much the planner collapsed.Execution-plan variant
POST /v1/retrievers/{retriever_id}/execute/explain returns the same plan in a MongoDB-explain-style shape if you prefer that format. Both are read-only and never execute the query.
Typical workflow
Explain before you ship
Run
explain with representative inputs to see estimated cost, bottlenecks, and applied optimizations.Act on bottlenecks & suggestions
Reduce
top_k on high-cost searches, add a selective attribute_filter (the optimizer pushes it down), or drop low-selectivity stages.Verify in production
Use retriever analytics to confirm real latency and cache-hit rates match the estimate.
Related
- Multi-Stage Retrieval — how stages compose
- Feature Search — the most common bottleneck stage
- Evaluations — measure quality alongside cost
- Best Practices — caching and cost optimization

