Learned Fusion

Learned fusion automatically discovers the optimal blend of embedding features for your users. Instead of manually setting weights (text: 0.7, image: 0.3), the system learns from interaction data which features produce results users engage with.

Thompson Sampling: Beta distributions evolve from uniform to peaked as interactions accumulate

How It Works

Learned fusion uses Thompson Sampling, a well-studied algorithm for the multi-armed bandit problem. Here’s how it applies to search fusion:

Initialize with uniform priors

Each search feature (e.g., text embeddings, image embeddings) starts with a Beta(1, 1) distribution — a flat line that assigns equal probability to all weight values. This means zero assumptions about which feature is better.

Sample weights at query time

When a query arrives, the system draws a random weight from each feature’s Beta distribution and normalizes them to sum to 1. Early on, samples are highly variable (exploration). As data accumulates, they stabilize (exploitation).

Execute search with sampled weights

The feature search stage runs each embedding search and fuses results using the sampled weights — functionally identical to weighted fusion, but with dynamically chosen weights.

Capture user interactions

Users interact with results: clicks, purchases, skips. Each interaction is recorded with the document ID, position, and the context key that identifies which weight sample was used.

Update Beta distributions

Positive interactions (clicks, purchases) increment the alpha parameter: alpha = 1 + clicks. Non-engagement increments beta: beta = 1 + (impressions - clicks). This shifts the distribution toward weights that produce engaging results.

Repeat with better weights

Next query: the updated distributions produce weight samples closer to what works. After hundreds of interactions, the system converges on near-optimal weights while still occasionally exploring alternatives.

Thompson Sampling Explained

Think of it like flipping weighted coins. Each feature has its own coin:

At the start, both coins are fair — you have no idea which feature is better, so you flip both and take whatever comes up.
After 50 interactions, the text feature’s coin lands “heads” 65% of the time (users click on text-matched results more). You naturally start weighting text higher, but still try image sometimes.
After 1000 interactions, the text coin lands heads 72% of the time with very little variance. You’re confident in the weights and rarely deviate.

The mathematical version: each “coin” is a Beta(alpha, beta) distribution where alpha counts successes (clicks) and beta counts non-successes (impressions without clicks). Sampling from this distribution gives you a weight that naturally balances exploration and exploitation.

Hierarchical Fallback

Not every user has enough interaction history for personalized weights. The system uses a four-level fallback:

Level	Context	Min Interactions	When Used
Personal	Individual user	5	User has clicked/purchased enough for reliable weights
Demographic	User segment	1	User is new, but their segment has data
Global	All users	1	No segment data; uses aggregate behavior
Prior	Uniform	0	No interactions at all; falls back to equal weights

The user_id in your interaction signals enables personal-level learning. The demographic_features config (e.g., ["INPUT.user_segment"]) enables demographic-level learning.

End-to-End Walkthrough

1. Create a retriever with learned fusion

{
  "retriever_name": "product-search-learned",
  "stages": [
    {
      "stage_name": "feature_search",
      "stage_type": "filter",
      "config": {
        "stage_id": "feature_search",
        "parameters": {
          "searches": [
            {
              "feature_uri": "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1",
              "query": "{{INPUT.query}}",
              "top_k": 100
            },
            {
              "feature_uri": "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding",
              "query": "{{INPUT.query}}",
              "top_k": 100
            }
          ],
          "fusion": "learned",
          "final_top_k": 25
        }
      }
    }
  ]
}

2. Execute a search

curl -X POST "$MP_API_URL/v1/retrievers/{retriever_id}/execute" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -d '{
    "inputs": {
      "query": "wireless earbuds noise canceling",
      "user_id": "user_456"
    }
  }'

With zero interactions, this behaves like RRF (uniform weights). The response includes an execution_id you’ll use for interaction tracking.

3. Capture interactions

curl -X POST "$MP_API_URL/v1/retrievers/interactions" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -d '{
    "feature_id": "doc_product_789",
    "interaction_type": ["click", "purchase"],
    "position": 2,
    "metadata": {
      "query": "wireless earbuds noise canceling"
    },
    "user_id": "user_456",
    "session_id": "sess_abc"
  }'

4. Improved results over time

After 100+ interactions, the same search for user_456 returns results with personalized fusion weights. If this user consistently engages with text-matched results over image-matched ones, the text feature weight increases for their queries.

5. Verify convergence

Use analytics to check how weights are evolving:

curl "$MP_API_URL/v1/analytics/retrievers/{retriever_id}/signals?signal_type=learned_weights&hours=168" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE"

Response Metadata

When learned fusion is active, the execution response includes a __learned_fusion__ object in each result’s metadata. Use it to verify the system is working and debug weight evolution:

{
  "__learned_fusion__": {
    "context_level": "personal",
    "context_key": "a3f8b2c1d4e5f607",
    "sampled_weights": {
      "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1": 0.72,
      "mixpeek://image_extractor@v1/google_siglip_base_v1": 0.28
    },
    "feature_uris": [
      "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1",
      "mixpeek://image_extractor@v1/google_siglip_base_v1"
    ],
    "effective_exploration": 0.37,
    "circuit_breaker_triggered": false,
    "weight_resolution_ms": 12.5
  }
}

Field	Description
`context_level`	Which fallback level was used: `personal`, `demographic`, `global`, or `none` (circuit breaker triggered, using uniform weights)
`context_key`	A truncated SHA-256 hash of the context values used to look up interaction history (e.g., `a3f8b2c1d4e5f607`)
`sampled_weights`	The actual weights used for this query, keyed by feature URI
`effective_exploration`	Current exploration multiplier after decay — lower means more exploitation
`circuit_breaker_triggered`	`true` if the weight lookup timed out and fell back to uniform weights
`weight_resolution_ms`	How long the weight lookup took in milliseconds

Check context_level to verify personalization is active. If you see "none" consistently, the user may not have enough interactions (check min_interactions) or the circuit breaker may be triggering due to ClickHouse latency.

Configuration Reference

string

required

Set to "learned" to enable Thompson Sampling fusion.

string

required

Each feature URI defines an “arm” in the bandit. The system learns a separate weight for each.

string

Passed at execution time. Enables personal-level weight learning. Without this, the system uses global weights only.

The Thompson Sampler uses these parameters, configurable in learning_config:

Parameter	Default	Range	Description
`prior_alpha`	`1.0`	`>= 0.1`	Beta distribution alpha prior. Higher = initial belief that features are effective
`prior_beta`	`1.0`	`>= 0.1`	Beta distribution beta prior. Higher = initial belief that features are ineffective
`exploration_bonus`	`1.0`	`0.1–10.0`	Multiplier for distribution variance; >1 increases exploration
`min_interactions`	`5`	—	Minimum interactions before using personal context

When to Use Learned vs Static

Scenario	Recommendation	Why
New product, no interaction data	`rrf`	No data to learn from; RRF is a strong default
Domain expert knows feature importance	`weighted`	Manual weights capture expert knowledge immediately
Diverse user base with different preferences	`learned`	Different users may benefit from different feature weights
A/B testing fusion approaches	`rrf` → `learned`	Start with baseline, measure improvement with evaluations
Single search feature	None needed	Fusion only applies when combining multiple features

Session-Level Adaptation

Learned fusion persists weight state in ClickHouse, which has write-then-read latency (seconds to minutes). For within-session adaptation — where a user’s first few clicks should influence their next search immediately — the system uses a Redis session cache. When a user interacts with a result, the interaction is written to both ClickHouse (durable) and a Redis session cache (ephemeral, 1-hour TTL). On the next search in the same session, the bandit merges the session cache entries into the ClickHouse-backed Beta distributions before sampling:

Read base α/β from ClickHouse (persistent history)
Read session interactions from Redis (current session)
Merge: α += session_successes, β += session_failures
Sample weights from the merged posterior

This gives sub-50ms feedback within a session while ClickHouse handles long-term persistence. Pass session_id on both search and interaction requests to enable this:

results = client.retrievers.execute(
    retriever_id,
    inputs={
        "query": "running shoes",
        "user_id": "user_456",
        "session_id": "sess_abc",
    },
)

Without session_id, the system still learns from interactions — it just won’t reflect them until ClickHouse ingests them (typically a few seconds). Session-level adaptation is optional but recommended for real-time UX.

Temporal Decay

User preferences change over time. The system applies exponential decay to older interactions so recent behavior matters more:

decayed_reward = reward * (decay_factor ^ days_ago)

With the default decay_factor: 0.995, the decay curve looks like:

Age	Retained Weight	Effect
1 day	99.5%	Essentially full strength
30 days	86% (`0.995^30`)	Still strong
90 days	64% (`0.995^90`)	Noticeably faded
180 days	41% (`0.995^180`)	Weak influence
365 days	16% (`0.995^365`)	Nearly gone

Configure decay in the learning_config:

{
  "learning_config": {
    "decay_factor": 0.995,
    "decay_window_days": 365
  }
}

decay_factor (default 0.995) — per-day multiplier. Set to 1.0 to disable decay entirely.
decay_window_days (default 365) — interactions older than this are ignored completely, reducing query cost.

Setting decay_factor too low (e.g., 0.95) causes rapid forgetting — a week-old interaction retains only 70% of its weight. Use values between 0.99 and 0.999 for most use cases.

Weight Clamping

Thompson Sampling can produce extreme weights that effectively silence a feature (e.g., text: 0.99, image: 0.01). Weight clamping prevents this by enforcing minimum and maximum bounds:

{
  "learning_config": {
    "min_weight": 0.05,
    "max_weight": 0.95
  }
}

After sampling from the Beta posteriors and normalizing, each weight is clamped to [min_weight, max_weight] and then re-normalized. This guarantees that every feature contributes at least min_weight to the final fusion, even for users with heavily skewed interaction histories. Why this matters: Without clamping, a user who clicks only text results could end up with image: 0.01 — effectively removing image search from their experience. If their preferences shift later, recovery is slow because the silenced feature produces almost no impressions to learn from.

Exploration Decay

The exploration_bonus parameter controls how much the bandit explores (tries different weight combinations) vs. exploits (uses what it has learned). With a static bonus, the bandit never fully settles on the best weights. Exploration decay reduces the bonus as interactions accumulate:

effective_exploration = max(exploration_floor, exploration_bonus * exploration_decay ^ total_interactions)

Configure it in learning_config:

{
  "learning_config": {
    "exploration_bonus": 1.0,
    "exploration_decay": 0.99,
    "exploration_floor": 0.1
  }
}

exploration_bonus (default 1.0) — initial exploration multiplier. Higher values mean more random early sampling.
exploration_decay (default 0.99) — per-interaction decay rate.
exploration_floor (default 0.1) — minimum exploration. The bandit never fully stops exploring — this prevents it from getting permanently stuck on suboptimal weights if preferences change.

After 100 interactions: 1.0 * 0.99^100 = 0.37. After 500: 1.0 * 0.99^500 = 0.007 (floored to 0.1). The system converges toward exploitation while maintaining a baseline level of exploration.

Multi-Signal Rewards

By default, learned fusion treats click as the only learning signal. The reward_map lets you assign different reward magnitudes to different interaction types:

{
  "learning_config": {
    "reward_map": {
      "click": 1.0,
      "purchase": 3.0,
      "add_to_cart": 2.0,
      "bookmark": 1.5,
      "positive_feedback": 2.0,
      "negative_feedback": -2.0,
      "skip": -1.0
    }
  }
}

Positive values increase the alpha parameter for the associated feature (making it more likely to be weighted higher). Negative values increase the beta parameter (penalizing the feature). A purchase at 3.0 shifts weights three times as much as a click at 1.0. Per-interaction rewards are also capped at max_reward_per_interaction (default 5.0) to prevent a single buggy or malicious interaction batch from dominating the learned weights. See the Reward Signals reference for all 17 supported interaction types and guidance on choosing reward values.

Auto-Tune overview — the top-level guide to the full feedback loop
Reward Signals — configuring which interactions drive learning
Rollout Guide — traffic splitting, shadow mode, kill switch
Fusion Strategies — comparison of all 5 strategies
Interaction Signals — capturing the data that powers learning
Evaluations — measuring learned fusion quality
Feature Search stage — where fusion is configured

Get started

Connect your data

Extract features

Build retrievers

Enrich & organize

Integrate & operate

Resources

How It Works

Thompson Sampling Explained

Hierarchical Fallback

End-to-End Walkthrough

1. Create a retriever with learned fusion

2. Execute a search

3. Capture interactions

4. Improved results over time

5. Verify convergence

Response Metadata

Configuration Reference

When to Use Learned vs Static

Session-Level Adaptation

Temporal Decay

Weight Clamping

Exploration Decay

Multi-Signal Rewards

​How It Works

​Thompson Sampling Explained

​Hierarchical Fallback

​End-to-End Walkthrough

​1. Create a retriever with learned fusion

​2. Execute a search

​3. Capture interactions

​4. Improved results over time

​5. Verify convergence

​Response Metadata

​Configuration Reference

​When to Use Learned vs Static

​Session-Level Adaptation

​Temporal Decay

​Weight Clamping

​Exploration Decay

​Multi-Signal Rewards

​Related

How It Works

Thompson Sampling Explained

Hierarchical Fallback

End-to-End Walkthrough

1. Create a retriever with learned fusion

2. Execute a search

3. Capture interactions

4. Improved results over time

5. Verify convergence

Response Metadata

Configuration Reference

When to Use Learned vs Static

Session-Level Adaptation

Temporal Decay

Weight Clamping

Exploration Decay

Multi-Signal Rewards

Related