Skip to main content
When a user interacts with a search result, that interaction carries a reward value that adjusts the learned fusion weights. A purchase is a stronger signal than a click; negative feedback is a penalty. The reward map controls these magnitudes.

Default Reward Map

If you do not provide a custom reward_map in learning_config, the system uses these defaults:
Interaction TypeDefault RewardSignal StrengthDescription
click1.0Moderate positiveUser clicked a result
long_view1.0Moderate positiveSustained engagement (dwell time > 30s)
purchase3.0Strong positiveConversion event
add_to_cart2.0PositiveIntent to purchase
bookmark1.5PositiveUser saved for later
share1.5PositiveUser shared the result
positive_feedback2.0Strong positiveExplicit thumbs up
negative_feedback-2.0Strong negativeExplicit thumbs down
skip-0.5Weak negativeResult was shown but ignored
return_to_results-0.5Weak negativeUser bounced back quickly
Interaction types not listed in the reward map contribute a reward of 0.0 — they are recorded but do not influence fusion weights.

Custom Reward Maps

Override the defaults by setting reward_map in learning_config:
{
  "fusion": "learned",
  "learning_config": {
    "context_features": ["INPUT.user_id"],
    "reward_map": {
      "click": 1.0,
      "purchase": 5.0,
      "add_to_cart": 2.5,
      "positive_feedback": 3.0,
      "negative_feedback": -3.0,
      "skip": -1.0
    }
  }
}
When you provide a custom reward_map, it replaces the defaults entirely. Only interaction types present in your map will influence fusion weights. Include every type you want to count.
The reward value is computed at interaction-write time and stored as reward_value in the interaction metadata. This means changing the reward_map only affects future interactions — previously recorded interactions retain their original reward values.

Negative Signals

Negative rewards (negative_feedback, skip, return_to_results) penalize the feature that surfaced the result. Mechanically, a negative reward increments the Beta distribution’s beta parameter, making it less likely that the associated feature receives high weight in future queries:
positive reward → alpha += reward       → feature weight trends up
negative reward → beta  += abs(reward)  → feature weight trends down
Negative signals should generally have smaller absolute values than positive signals. A single negative_feedback: -5.0 would outweigh five click: 1.0 interactions, which can cause rapid weight swings. Start conservative and tune based on evaluation results.

Position Bias

Results shown at position 0 get clicked more often than results at position 10, regardless of relevance. This is position bias — a well-known problem in learning-to-rank systems. Auto-Tune records the position field on every interaction. The bandit aggregation accounts for this by weighting interactions inversely by position — a click at position 8 is a stronger signal than a click at position 0, because the user scrolled past many results to find it.
Always include position when posting interactions. Without it, position bias correction cannot be applied, and top-ranked results will receive disproportionate reward regardless of actual relevance.

Temporal Decay

User preferences change over time. Auto-Tune applies exponential decay to older interactions so that recent behavior matters more:
effective_reward = reward * (decay_factor ^ days_ago)
decay_factorAfter 30 daysAfter 90 daysAfter 180 daysAfter 365 days
1.0 (no decay)100%100%100%100%
0.99997%91%84%69%
0.99586%64%41%16%
0.99074%41%17%3%
0.98055%16%3%~0%
Configure via learning_config:
{
  "learning_config": {
    "decay_factor": 0.995,
    "decay_window_days": 365
  }
}
Interactions older than decay_window_days are ignored entirely (not just decayed to near-zero, but excluded from the aggregation query).

Examples

E-commerce: purchases matter most

{
  "learning_config": {
    "context_features": ["INPUT.user_id"],
    "reward_map": {
      "purchase": 5.0,
      "add_to_cart": 2.0,
      "click": 1.0,
      "bookmark": 1.5,
      "negative_feedback": -2.0,
      "return_to_results": -1.0
    },
    "decay_factor": 0.995,
    "min_interactions": 3
  }
}
A user who purchases products found via text search will see their text feature weight increase faster than a user who only clicks.

Content platform: engagement over clicks

{
  "learning_config": {
    "context_features": ["INPUT.user_id"],
    "reward_map": {
      "long_view": 2.0,
      "share": 3.0,
      "bookmark": 1.5,
      "click": 0.5,
      "skip": -1.0,
      "dwell_time_short": -1.5
    },
    "decay_factor": 0.990,
    "min_interactions": 5
  }
}
Clicks are downweighted relative to deep engagement (long views, shares). Short dwell times are penalized more heavily — a click that bounces is worse than no click at all.

Internal search: explicit feedback only

{
  "learning_config": {
    "context_features": ["INPUT.user_id"],
    "reward_map": {
      "positive_feedback": 3.0,
      "negative_feedback": -3.0
    },
    "decay_factor": 1.0,
    "min_interactions": 10
  }
}
Only explicit thumbs up/down influence weights. Clicks and views are ignored. No temporal decay — in an internal tool, preferences tend to be stable. Higher min_interactions threshold because explicit feedback is sparse.