text: 0.7, image: 0.3, the system learns from clicks, purchases, and feedback to discover the optimal blend for every user — using Thompson Sampling (a multi-armed bandit algorithm) with hierarchical fallback from personal to demographic to global priors.
Auto-Tune closes the gap between “search works” and “search works for this user” without building a separate recommendation system.
How It Works
User searches
A query arrives with a
user_id. The system looks up that user’s learned fusion weights — or falls back to segment-level or global weights if the user is new.Results ranked by fusion weights
The feature search stage runs each embedding search and fuses results using the personalized weights. Early on, weights are exploratory (high variance). As data accumulates, they stabilize around what works for this user.
User interacts
The user clicks, purchases, skips, or provides feedback. Each interaction is recorded with the document ID, position, and the feature URI that produced the match.
Weights updated
Positive interactions (clicks, purchases) increase the weight of the feature that surfaced the result. Negative signals (skips, negative feedback) decrease it. Different interaction types carry different reward magnitudes — a purchase is a stronger signal than a click.
Quick Start
1. Create a retriever with learned fusion
2. Execute with a user ID
execution_id and learned_fusion_context metadata for interaction tracking.
3. Post interactions — results automatically improve
user_456 returns results personalized to their feature preferences. If this user consistently engages with text-matched results over image-matched ones, the text feature weight increases for their queries.
Key Concepts
Reward Signals
Configure how different interaction types (clicks, purchases, feedback) influence learned fusion weights. Customize the reward map, handle negative signals, and tune temporal decay.
Rollout & Safety
Safely deploy learned fusion with traffic splitting, shadow mode, kill switches, per-user opt-out, and weight bounds. Includes a recommended rollout plan.
Learned Fusion Deep Dive
How Thompson Sampling works under the hood — Beta distributions, exploration vs. exploitation, and the math behind weight convergence.
Evaluations
Measure whether learned fusion actually improves retrieval quality. Compare NDCG, precision, and recall against static fusion baselines.
Configuration Reference
Thelearning_config object is set inside the feature search stage parameters alongside fusion: "learned":
| Field | Type | Default | Description |
|---|---|---|---|
context_features | string[] | ["INPUT.user_id"] | Input fields used for personal-level weight learning. Each value references an INPUT.* field from the retriever’s input_schema. |
demographic_features | string[] | [] | Input fields for segment-level fallback (e.g., "INPUT.user_segment"). Used when a user has insufficient personal history. |
reward_signal | string | "click" | Deprecated. Single interaction type used as the learning signal. Use reward_map instead. |
reward_map | object | See Reward Signals | Maps interaction types to reward magnitudes. Positive values reinforce; negative values penalize. |
min_interactions | integer | 5 | Minimum interactions before using personal-level weights. Below this threshold, the system falls back to demographic or global weights. |
exploration_bonus | float | 1.0 | Initial multiplier for distribution variance. Higher values increase exploration (more weight variability). |
exploration_decay | float | 0.99 | Per-interaction decay applied to the exploration bonus. Gradually shifts from exploration to exploitation. |
exploration_floor | float | 0.1 | Minimum exploration bonus. Prevents the system from fully exploiting — there is always some chance of trying alternative weights. |
decay_factor | float | 0.995 | Per-day exponential decay applied to older interactions. 1.0 disables decay (interactions never fade). |
decay_window_days | integer | 365 | Interactions older than this are ignored entirely. |
min_weight | float | 0.05 | Minimum weight any feature can receive after sampling. Prevents a feature from being silenced. |
max_weight | float | 0.95 | Maximum weight any feature can receive after sampling. Prevents one feature from completely dominating. |
rollout_pct | float | 100.0 | Percentage of requests (0-100) that use learned weights. The rest use static fusion. Uses deterministic bucketing so a user does not flip-flop between treatments. |
shadow_mode | boolean | false | When true, learned weights are computed and logged but static fusion results are served. Use this to evaluate learned fusion before going live. |
Hierarchical Fallback
Not every user has enough interaction history for personalized weights. The system uses a four-level fallback:| Level | Context | Min Interactions | When Used |
|---|---|---|---|
| Personal | Individual user | Configurable (default 5) | User has enough interactions for reliable personal weights |
| Demographic | User segment | 1 | User is new, but their segment (e.g., “enterprise”, “consumer”) has data |
| Global | All users | 1 | No segment data available; uses aggregate behavior across all users |
| Prior | Uniform | 0 | No interactions at all; falls back to equal weights (equivalent to RRF) |
context_features field controls personal-level resolution (typically ["INPUT.user_id"]). The demographic_features field enables segment-level fallback (e.g., ["INPUT.plan_tier"]).
Related
- Interactions — capturing the user behavior that powers learning
- Feature Search stage — where fusion and
learning_configare configured - Fusion Strategies — comparison of all 5 fusion strategies

