When a user interacts with a search result, that interaction carries a reward value that adjusts the learned fusion weights. A purchase is a stronger signal than a click; negative feedback is a penalty. The reward map controls these magnitudes.
Default Reward Map
If you do not provide a custom reward_map in learning_config, the system uses these defaults:
| Interaction Type | Default Reward | Signal Strength | Description |
|---|
click | 1.0 | Moderate positive | User clicked a result |
long_view | 1.0 | Moderate positive | Sustained engagement (dwell time > 30s) |
purchase | 3.0 | Strong positive | Conversion event |
add_to_cart | 2.0 | Positive | Intent to purchase |
bookmark | 1.5 | Positive | User saved for later |
share | 1.5 | Positive | User shared the result |
positive_feedback | 2.0 | Strong positive | Explicit thumbs up |
negative_feedback | -2.0 | Strong negative | Explicit thumbs down |
skip | -0.5 | Weak negative | Result was shown but ignored |
return_to_results | -0.5 | Weak negative | User bounced back quickly |
Interaction types not listed in the reward map contribute a reward of 0.0 — they are recorded but do not influence fusion weights.
Custom Reward Maps
Override the defaults by setting reward_map in learning_config:
{
"fusion": "learned",
"learning_config": {
"context_features": ["INPUT.user_id"],
"reward_map": {
"click": 1.0,
"purchase": 5.0,
"add_to_cart": 2.5,
"positive_feedback": 3.0,
"negative_feedback": -3.0,
"skip": -1.0
}
}
}
When you provide a custom reward_map, it replaces the defaults entirely. Only interaction types present in your map will influence fusion weights. Include every type you want to count.
The reward value is computed at interaction-write time and stored as reward_value in the interaction metadata. This means changing the reward_map only affects future interactions — previously recorded interactions retain their original reward values.
Negative Signals
Negative rewards (negative_feedback, skip, return_to_results) penalize the feature that surfaced the result. Mechanically, a negative reward increments the Beta distribution’s beta parameter, making it less likely that the associated feature receives high weight in future queries:
positive reward → alpha += reward → feature weight trends up
negative reward → beta += abs(reward) → feature weight trends down
Negative signals should generally have smaller absolute values than positive signals. A single negative_feedback: -5.0 would outweigh five click: 1.0 interactions, which can cause rapid weight swings. Start conservative and tune based on evaluation results.
Position Bias
Results shown at position 0 get clicked more often than results at position 10, regardless of relevance. This is position bias — a well-known problem in learning-to-rank systems.
Auto-Tune records the position field on every interaction. The bandit aggregation accounts for this by weighting interactions inversely by position — a click at position 8 is a stronger signal than a click at position 0, because the user scrolled past many results to find it.
Always include position when posting interactions. Without it, position bias correction cannot be applied, and top-ranked results will receive disproportionate reward regardless of actual relevance.
Temporal Decay
User preferences change over time. Auto-Tune applies exponential decay to older interactions so that recent behavior matters more:
effective_reward = reward * (decay_factor ^ days_ago)
decay_factor | After 30 days | After 90 days | After 180 days | After 365 days |
|---|
1.0 (no decay) | 100% | 100% | 100% | 100% |
0.999 | 97% | 91% | 84% | 69% |
0.995 | 86% | 64% | 41% | 16% |
0.990 | 74% | 41% | 17% | 3% |
0.980 | 55% | 16% | 3% | ~0% |
Configure via learning_config:
{
"learning_config": {
"decay_factor": 0.995,
"decay_window_days": 365
}
}
Interactions older than decay_window_days are ignored entirely (not just decayed to near-zero, but excluded from the aggregation query).
Examples
E-commerce: purchases matter most
{
"learning_config": {
"context_features": ["INPUT.user_id"],
"reward_map": {
"purchase": 5.0,
"add_to_cart": 2.0,
"click": 1.0,
"bookmark": 1.5,
"negative_feedback": -2.0,
"return_to_results": -1.0
},
"decay_factor": 0.995,
"min_interactions": 3
}
}
A user who purchases products found via text search will see their text feature weight increase faster than a user who only clicks.
Content platform: engagement over clicks
{
"learning_config": {
"context_features": ["INPUT.user_id"],
"reward_map": {
"long_view": 2.0,
"share": 3.0,
"bookmark": 1.5,
"click": 0.5,
"skip": -1.0,
"dwell_time_short": -1.5
},
"decay_factor": 0.990,
"min_interactions": 5
}
}
Clicks are downweighted relative to deep engagement (long views, shares). Short dwell times are penalized more heavily — a click that bounces is worse than no click at all.
Internal search: explicit feedback only
{
"learning_config": {
"context_features": ["INPUT.user_id"],
"reward_map": {
"positive_feedback": 3.0,
"negative_feedback": -3.0
},
"decay_factor": 1.0,
"min_interactions": 10
}
}
Only explicit thumbs up/down influence weights. Clicks and views are ignored. No temporal decay — in an internal tool, preferences tend to be stable. Higher min_interactions threshold because explicit feedback is sparse.