Skip to main content
Mixpeek optimizes every retriever before it runs — reordering, fusing, and pushing work down into the vector store — then lets you inspect exactly what it did with the explain endpoint. You write the pipeline you find readable; the optimizer makes it fast.

Automatic optimizations

When you execute a retriever, the planner rewrites your stage list before execution. These transformations are automatic — you don’t configure them:
OptimizationWhat it doesWhy it helps
Filter push-downMoves attribute filters ahead of vector searchShrinks the candidate set before the expensive embedding search runs
Stage fusionMerges adjacent compatible stages into oneFewer passes over the result set
Grouping optimizationRewrites group/reduce stages to run database-sideAvoids materializing intermediate results
Computation push-downRuns data-plane stages (feature_search, attribute_filter, sort_attribute, aggregate) inside MVSEliminates a network round-trip and lets the vector store filter/sort where the data lives
Parallel sub-queriesRuns independent operations (search + count, search + facet) concurrentlyLower wall-clock latency
Over-fetch hintsFetches extra candidates when a later stage will filter them outPreserves recall after post-filtering
Because the optimizer pushes filters down for you, write filters wherever they read most clearly — you don’t need to hand-order stages for performance. Use explain to confirm what was pushed.
The retriever is also fetched and optimized once per request, then reused across a batch — so POST /v1/retrievers/{id}/execute/batch amortizes planning across all queries.

Inspect the plan with explain

POST /v1/retrievers/{retriever_id}/explain returns the optimized execution plan without running the query — per-stage cost and latency estimates, bottlenecks, and exactly which optimizations were applied. Pass hypothetical inputs to see how the plan changes with different parameters.
curl -sS -X POST "$MP_API_URL/v1/retrievers/{retriever_id}/explain" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{ "inputs": { "query": "people discussing electric vehicles" } }'
Example response
{
  "retriever_id": "ret_abc123",
  "execution_plan": [
    {
      "stage_index": 0,
      "stage_name": "attribute_filter",
      "stage_type": "filter",
      "estimated_input": 10000,
      "estimated_output": 5000,
      "estimated_efficiency": 0.5,
      "estimated_cost_credits": 0.01,
      "estimated_duration_ms": 20,
      "cache_likely": true,
      "optimization_notes": ["Pushed down from stage 2"],
      "warnings": []
    },
    {
      "stage_index": 1,
      "stage_name": "feature_search",
      "stage_type": "filter",
      "estimated_input": 5000,
      "estimated_output": 100,
      "estimated_efficiency": 0.02,
      "estimated_cost_credits": 0.5,
      "estimated_duration_ms": 200,
      "cache_likely": false,
      "optimization_notes": [],
      "warnings": ["High cost stage - consider reducing top_k"]
    }
  ],
  "estimated_cost": { "total_credits": 0.51, "total_duration_ms": 220 },
  "bottleneck_stages": ["feature_search"],
  "optimization_applied": true,
  "optimization_details": {
    "original_stage_count": 3,
    "optimized_stage_count": 2,
    "stage_reduction_pct": 33.3,
    "decisions": [
      {
        "rule_type": "push_down_filters",
        "applied": true,
        "reason": "Moved attribute_filter before feature_search to reduce search scope"
      }
    ]
  },
  "optimization_suggestions": [
    { "type": "reduce_limit", "stage": "feature_search", "message": "Consider reducing top_k to improve latency" }
  ]
}

How to read it

FieldUse it to…
execution_plan[].estimated_input/outputSee how each stage narrows the set — a stage that barely reduces the set may be unnecessary
estimated_efficiencySpot low-selectivity stages (close to 1.0 = passes almost everything through)
estimated_cost_credits / estimated_duration_msBudget before running; find the expensive stage
bottleneck_stagesThe stages dominating latency — optimize these first
cache_likelyWhether a stage will likely hit the cache
optimization_details.decisionsExactly which automatic rewrites fired (and why)
optimization_suggestionsConcrete, actionable tuning hints
warningsPer-stage red flags (e.g. high-cost stage, overly broad top_k)
The execution_plan reflects the optimized pipeline, not your original stage list. Compare optimization_details.original_stage_count vs optimized_stage_count to see how much the planner collapsed.

Execution-plan variant

POST /v1/retrievers/{retriever_id}/execute/explain returns the same plan in a MongoDB-explain-style shape if you prefer that format. Both are read-only and never execute the query.

Typical workflow

1

Explain before you ship

Run explain with representative inputs to see estimated cost, bottlenecks, and applied optimizations.
2

Act on bottlenecks & suggestions

Reduce top_k on high-cost searches, add a selective attribute_filter (the optimizer pushes it down), or drop low-selectivity stages.
3

Verify in production

Use retriever analytics to confirm real latency and cache-hit rates match the estimate.