Reward Signals

When a user interacts with a search result, that interaction carries a reward value that adjusts the learned fusion weights. A purchase is a stronger signal than a click; negative feedback is a penalty. The reward map controls these magnitudes.

Default Reward Map

If you do not provide a custom reward_map in learning_config, the system uses these defaults:

Interaction Type	Default Reward	Signal Strength	Description
`impression`	`0.0`	Neutral	Result was rendered on screen (passive signal)
`view`	`0.0`	Neutral	User viewed a result (not in default reward map)
`click`	`1.0`	Moderate positive	User clicked a result
`dwell`	`0.5`	Weak positive	User lingered on a result
`long_view`	`1.0`	Moderate positive	Sustained engagement (dwell time > 30s)
`purchase`	`3.0`	Strong positive	Conversion event
`add_to_cart`	`2.0`	Positive	Intent to purchase
`wishlist`	`0.0`	Neutral	User added to wishlist (not in default reward map)
`bookmark`	`1.5`	Positive	User saved for later
`share`	`1.5`	Positive	User shared the result
`positive_feedback`	`2.0`	Strong positive	Explicit thumbs up
`negative_feedback`	`-2.0`	Strong negative	Explicit thumbs down
`query_refinement`	`0.0`	Neutral	User modified their search (not in default reward map)
`zero_results`	`0.0`	Neutral	Query yielded no results (not in default reward map)
`filter_toggle`	`0.0`	Neutral	User modified filters (not in default reward map)
`skip`	`-0.5`	Weak negative	Result was shown but ignored
`return_to_results`	`-0.5`	Weak negative	User bounced back quickly

Interaction types not listed in the reward map contribute a reward of 0.0 — they are recorded but do not influence fusion weights.

Custom Reward Maps

Override the defaults by setting reward_map in learning_config:

{
  "fusion": "learned",
  "learning_config": {
    "context_features": ["INPUT.user_id"],
    "reward_map": {
      "click": 1.0,
      "purchase": 5.0,
      "add_to_cart": 2.5,
      "positive_feedback": 3.0,
      "negative_feedback": -3.0,
      "skip": -1.0
    }
  }
}

When you provide a custom reward_map, it replaces the defaults entirely. Only interaction types present in your map will influence fusion weights. Include every type you want to count.

The reward value is computed at interaction-write time and stored as reward_value in the interaction metadata. When an interaction has multiple types, the reward with the largest absolute value is used (not summed). For example, ['click', 'purchase'] yields 3.0 (the purchase reward), not 4.0. This means changing the reward_map only affects future interactions — previously recorded interactions retain their original reward values.

Negative Signals

Negative rewards (negative_feedback, skip, return_to_results) penalize the feature that surfaced the result. Mechanically, a negative reward increments the Beta distribution’s beta parameter, making it less likely that the associated feature receives high weight in future queries:

positive reward → alpha += reward       → feature weight trends up
negative reward → beta  += abs(reward)  → feature weight trends down

Negative signals should generally have smaller absolute values than positive signals. A single negative_feedback: -5.0 would outweigh five click: 1.0 interactions, which can cause rapid weight swings. Start conservative and tune based on evaluation results.

Position Bias

Results shown at position 0 get clicked more often than results at position 10, regardless of relevance. This is position bias — a well-known problem in learning-to-rank systems. Auto-Tune records the position field on every interaction for analytics and audit purposes. However, the current reward computation does not weight interactions by position — a click at position 8 receives the same reward value as a click at position 0. Position bias correction is a planned enhancement but is not yet implemented.

Include position when posting interactions. Although position does not currently affect reward computation, it is stored alongside each interaction for future position-aware modeling and for your own analytics. Recording it now means you won’t need to backfill when position-based weighting is added.

Temporal Decay

User preferences change over time. Auto-Tune applies exponential decay to older interactions so that recent behavior matters more:

effective_reward = reward * (decay_factor ^ days_ago)

`decay_factor`	After 30 days	After 90 days	After 180 days	After 365 days
`1.0` (no decay)	100%	100%	100%	100%
`0.999`	97%	91%	84%	69%
`0.995`	86%	64%	41%	16%
`0.990`	74%	41%	17%	3%
`0.980`	55%	16%	3%	~0%

Configure via learning_config:

{
  "learning_config": {
    "decay_factor": 0.995,
    "decay_window_days": 365
  }
}

Interactions older than decay_window_days are ignored entirely (not just decayed to near-zero, but excluded from the aggregation query).

Backfilling historical interactions

By default the server timestamps each interaction at the moment it’s recorded. If you’re migrating existing click/purchase logs into Mixpeek, pass occurred_at (ISO 8601) so temporal decay weights each interaction by its true age instead of treating everything as brand-new:

POST /v1/retrievers/interactions
{
  "feature_id": "doc_123",
  "interaction_type": ["purchase"],
  "position": 0,
  "user_id": "user_42",
  "feature_uri": "mixpeek://text_extractor@v1/...",
  "occurred_at": "2026-01-15T10:30:00Z"
}

Python SDK

client.retrievers.create_interaction(
    feature_id="doc_123",
    interaction_type=["purchase"],
    user_id="user_42",
    feature_uri="mixpeek://text_extractor@v1/...",
    occurred_at="2026-01-15T10:30:00Z",  # backfill with the real timestamp
)

Omit occurred_at for live interactions — the server stamps “now”. A naive datetime is interpreted as UTC, and a future value is clamped to now. Backfilled events (with occurred_at) bypass the real-time within-session cache so historical data can’t pollute live, in-session adaptation.

For large histories, send interactions in bulk (1–1000 per call) instead of one request each:

Python SDK

result = client.retrievers.backfill_interactions([
    {"feature_id": "d1", "interaction_type": ["purchase"], "user_id": "u1",
     "feature_uri": "mixpeek://text_extractor@v1/...",
     "occurred_at": "2026-01-15T10:30:00Z"},
    {"feature_id": "d2", "interaction_type": ["click"], "user_id": "u2",
     "feature_uri": "mixpeek://text_extractor@v1/...",
     "occurred_at": "2026-01-16T11:00:00Z"},
    # ... up to 1000 per call
])
# -> {"created": 2, "failed": 0, "errors": [], "results": [{"index": 0, "interaction_id": "int_abc123", "status": "created"}, ...]}

REST

POST /v1/retrievers/interactions/batch
{ "interactions": [ { "feature_id": "d1", "interaction_type": ["purchase"], "occurred_at": "2026-01-15T10:30:00Z" }, ... ] }

Each row is enriched (reward value, feature/context promotion) exactly like a single create; one bad row doesn’t sink the batch (it’s reported in errors). The response now also includes per-item results with assigned interaction_ids.

The private API uses feature_id while published (public) retrievers use document_id — they refer to the same field. Use whichever matches the endpoint you’re calling.

Examples

E-commerce: purchases matter most

{
  "learning_config": {
    "context_features": ["INPUT.user_id"],
    "reward_map": {
      "purchase": 5.0,
      "add_to_cart": 2.0,
      "click": 1.0,
      "bookmark": 1.5,
      "negative_feedback": -2.0,
      "return_to_results": -1.0
    },
    "decay_factor": 0.995,
    "min_interactions": 3
  }
}

A user who purchases products found via text search will see their text feature weight increase faster than a user who only clicks.

Content platform: engagement over clicks

{
  "learning_config": {
    "context_features": ["INPUT.user_id"],
    "reward_map": {
      "long_view": 2.0,
      "share": 3.0,
      "bookmark": 1.5,
      "click": 0.5,
      "skip": -1.5
    },
    "decay_factor": 0.990,
    "min_interactions": 5
  }
}

Clicks are downweighted relative to deep engagement (long views, shares). Skips are penalized more heavily — a result shown but ignored is a stronger negative signal than a mere absence of clicks.

Internal search: explicit feedback only

{
  "learning_config": {
    "context_features": ["INPUT.user_id"],
    "reward_map": {
      "positive_feedback": 3.0,
      "negative_feedback": -3.0
    },
    "decay_factor": 1.0,
    "min_interactions": 10
  }
}

Only explicit thumbs up/down influence weights. Clicks and views are ignored. No temporal decay — in an internal tool, preferences tend to be stable. Higher min_interactions threshold because explicit feedback is sparse.

Auto-Tune — overview of the full feedback loop
Interactions — how to capture user behavior
Rollout & Safety — safely deploying learned fusion

Get started

Connect your data

Extract features

Build retrievers

Enrich & organize

Integrate & operate

Resources

Default Reward Map

Custom Reward Maps

Negative Signals

Position Bias

Temporal Decay

Backfilling historical interactions

Examples

E-commerce: purchases matter most

Content platform: engagement over clicks

Internal search: explicit feedback only

​Default Reward Map

​Custom Reward Maps

​Negative Signals

​Position Bias

​Temporal Decay

​Backfilling historical interactions

​Examples

​E-commerce: purchases matter most

​Content platform: engagement over clicks

​Internal search: explicit feedback only

​Related

Default Reward Map

Custom Reward Maps

Negative Signals

Position Bias

Temporal Decay

Backfilling historical interactions

Examples

E-commerce: purchases matter most

Content platform: engagement over clicks

Internal search: explicit feedback only

Related