Skip to main content
Learned fusion automatically discovers the optimal blend of embedding features for your users. Instead of manually setting weights (text: 0.7, image: 0.3), the system learns from interaction data which features produce results users engage with.
Thompson Sampling: Beta distributions evolve from uniform to peaked as interactions accumulate

How It Works

Learned fusion uses Thompson Sampling, a well-studied algorithm for the multi-armed bandit problem. Here’s how it applies to search fusion:
1

Initialize with uniform priors

Each search feature (e.g., text embeddings, image embeddings) starts with a Beta(1, 1) distribution — a flat line that assigns equal probability to all weight values. This means zero assumptions about which feature is better.
2

Sample weights at query time

When a query arrives, the system draws a random weight from each feature’s Beta distribution and normalizes them to sum to 1. Early on, samples are highly variable (exploration). As data accumulates, they stabilize (exploitation).
3

Execute search with sampled weights

The feature search stage runs each embedding search and fuses results using the sampled weights — functionally identical to weighted fusion, but with dynamically chosen weights.
4

Capture user interactions

Users interact with results: clicks, purchases, skips. Each interaction is recorded with the document ID, position, and the context key that identifies which weight sample was used.
5

Update Beta distributions

Positive interactions (clicks, purchases) increment the alpha parameter: alpha = 1 + clicks. Non-engagement increments beta: beta = 1 + (impressions - clicks). This shifts the distribution toward weights that produce engaging results.
6

Repeat with better weights

Next query: the updated distributions produce weight samples closer to what works. After hundreds of interactions, the system converges on near-optimal weights while still occasionally exploring alternatives.

Thompson Sampling Explained

Think of it like flipping weighted coins. Each feature has its own coin:
  • At the start, both coins are fair — you have no idea which feature is better, so you flip both and take whatever comes up.
  • After 50 interactions, the text feature’s coin lands “heads” 65% of the time (users click on text-matched results more). You naturally start weighting text higher, but still try image sometimes.
  • After 1000 interactions, the text coin lands heads 72% of the time with very little variance. You’re confident in the weights and rarely deviate.
The mathematical version: each “coin” is a Beta(alpha, beta) distribution where alpha counts successes (clicks) and beta counts non-successes (impressions without clicks). Sampling from this distribution gives you a weight that naturally balances exploration and exploitation.

Hierarchical Fallback

Not every user has enough interaction history for personalized weights. The system uses a three-level fallback:
LevelContextMin InteractionsWhen Used
PersonalIndividual user5User has clicked/purchased enough for reliable weights
DemographicUser segment1User is new, but their segment has data
GlobalAll users1No segment data; uses aggregate behavior
PriorUniform0No interactions at all; falls back to equal weights
The user_id in your interaction signals enables personal-level learning. The segment field (e.g., “enterprise”, “consumer”, “power-user”) enables demographic-level learning.

End-to-End Walkthrough

1. Create a retriever with learned fusion

{
  "name": "product-search-learned",
  "stages": [
    {
      "stage_type": "filter",
      "stage_id": "feature_search",
      "parameters": {
        "searches": [
          {
            "feature_uri": "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1",
            "query": "{{INPUT.query}}",
            "top_k": 100
          },
          {
            "feature_uri": "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding",
            "query": "{{INPUT.query}}",
            "top_k": 100
          }
        ],
        "fusion": "learned",
        "final_top_k": 25
      }
    }
  ]
}
curl -X POST "$MP_API_URL/v1/retrievers/{retriever_id}/execute" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -d '{
    "query": {
      "query": "wireless earbuds noise canceling"
    },
    "user_id": "user_456"
  }'
With zero interactions, this behaves like RRF (uniform weights). The response includes an execution_id you’ll use for interaction tracking.

3. Capture interactions

curl -X POST "$MP_API_URL/v1/retrievers/interactions" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -d '{
    "feature_id": "doc_product_789",
    "interaction_type": ["click", "purchase"],
    "position": 2,
    "metadata": {
      "query": "wireless earbuds noise canceling"
    },
    "user_id": "user_456",
    "session_id": "sess_abc"
  }'

4. Improved results over time

After 100+ interactions, the same search for user_456 returns results with personalized fusion weights. If this user consistently engages with text-matched results over image-matched ones, the text feature weight increases for their queries.

5. Verify convergence

Use analytics to check how weights are evolving:
curl "$MP_API_URL/v1/analytics/retrievers/{retriever_id}/signals?signal_type=learned_weights&hours=168" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE"

Response Metadata

When learned fusion is active, the execution response includes a __learned_fusion__ object in each result’s metadata. Use it to verify the system is working and debug weight evolution:
{
  "__learned_fusion__": {
    "context_level": "personal",
    "context_key": "user:user_456",
    "sampled_weights": {
      "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1": 0.72,
      "mixpeek://image_extractor@v1/google_siglip_base_v1": 0.28
    },
    "feature_uris": [
      "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1",
      "mixpeek://image_extractor@v1/google_siglip_base_v1"
    ],
    "effective_exploration": 0.37,
    "circuit_breaker_triggered": false,
    "weight_resolution_ms": 12.5
  }
}
FieldDescription
context_levelWhich fallback level was used: personal, demographic, global, or none (circuit breaker triggered, using uniform weights)
context_keyThe key used to look up interaction history (e.g., user:user_456)
sampled_weightsThe actual weights used for this query, keyed by feature URI
effective_explorationCurrent exploration multiplier after decay — lower means more exploitation
circuit_breaker_triggeredtrue if the weight lookup timed out and fell back to uniform weights
weight_resolution_msHow long the weight lookup took in milliseconds
Check context_level to verify personalization is active. If you see "none" consistently, the user may not have enough interactions (check min_interactions) or the circuit breaker may be triggering due to ClickHouse latency.

Configuration Reference

fusion
string
required
Set to "learned" to enable Thompson Sampling fusion.
searches[].feature_uri
string
required
Each feature URI defines an “arm” in the bandit. The system learns a separate weight for each.
user_id
string
Passed at execution time. Enables personal-level weight learning. Without this, the system uses global weights only.
The Thompson Sampler uses these internal parameters (not user-configurable):
ParameterDefaultDescription
prior_alpha1.0Beta distribution alpha prior (uniform)
prior_beta1.0Beta distribution beta prior (uniform)
exploration_bonus1.0Multiplier for distribution variance; >1 increases exploration
min_interactions5Minimum interactions before using personal context

When to Use Learned vs Static

ScenarioRecommendationWhy
New product, no interaction datarrfNo data to learn from; RRF is a strong default
Domain expert knows feature importanceweightedManual weights capture expert knowledge immediately
Diverse user base with different preferenceslearnedDifferent users may benefit from different feature weights
A/B testing fusion approachesrrflearnedStart with baseline, measure improvement with evaluations
Single search featureNone neededFusion only applies when combining multiple features

Session-Level Adaptation

Learned fusion persists weight state in ClickHouse, which has write-then-read latency (seconds to minutes). For within-session adaptation — where a user’s first few clicks should influence their next search immediately — the system uses a Redis session cache. When a user interacts with a result, the interaction is written to both ClickHouse (durable) and a Redis session cache (ephemeral, 1-hour TTL). On the next search in the same session, the bandit merges the session cache entries into the ClickHouse-backed Beta distributions before sampling:
1. Read base α/β from ClickHouse (persistent history)
2. Read session interactions from Redis (current session)
3. Merge: α += session_successes, β += session_failures
4. Sample weights from the merged posterior
This gives sub-50ms feedback within a session while ClickHouse handles long-term persistence. Pass session_id on both search and interaction requests to enable this:
results = client.retrievers.execute(
    retriever_id=retriever_id,
    query={"query": "running shoes"},
    user_id="user_456",
    session_id="sess_abc"
)
Without session_id, the system still learns from interactions — it just won’t reflect them until ClickHouse ingests them (typically a few seconds). Session-level adaptation is optional but recommended for real-time UX.

Temporal Decay

User preferences change over time. The system applies exponential decay to older interactions so recent behavior matters more:
decayed_reward = reward * (decay_factor ^ days_ago)
With the default decay_factor: 0.995, the decay curve looks like:
AgeRetained WeightEffect
1 day99.5%Essentially full strength
30 days86% (0.995^30)Still strong
90 days64% (0.995^90)Noticeably faded
180 days41% (0.995^180)Weak influence
365 days16% (0.995^365)Nearly gone
Configure decay in the learning_config:
{
  "learning_config": {
    "decay_factor": 0.995,
    "decay_window_days": 365
  }
}
  • decay_factor (default 0.995) — per-day multiplier. Set to 1.0 to disable decay entirely.
  • decay_window_days (default 365) — interactions older than this are ignored completely, reducing query cost.
Setting decay_factor too low (e.g., 0.95) causes rapid forgetting — a week-old interaction retains only 70% of its weight. Use values between 0.99 and 0.999 for most use cases.

Weight Clamping

Thompson Sampling can produce extreme weights that effectively silence a feature (e.g., text: 0.99, image: 0.01). Weight clamping prevents this by enforcing minimum and maximum bounds:
{
  "learning_config": {
    "min_weight": 0.05,
    "max_weight": 0.95
  }
}
After sampling from the Beta posteriors and normalizing, each weight is clamped to [min_weight, max_weight] and then re-normalized. This guarantees that every feature contributes at least min_weight to the final fusion, even for users with heavily skewed interaction histories. Why this matters: Without clamping, a user who clicks only text results could end up with image: 0.01 — effectively removing image search from their experience. If their preferences shift later, recovery is slow because the silenced feature produces almost no impressions to learn from.

Exploration Decay

The exploration_bonus parameter controls how much the bandit explores (tries different weight combinations) vs. exploits (uses what it has learned). With a static bonus, the bandit never fully settles on the best weights. Exploration decay reduces the bonus as interactions accumulate:
effective_exploration = max(exploration_floor, exploration_bonus * exploration_decay ^ total_interactions)
Configure it in learning_config:
{
  "learning_config": {
    "exploration_bonus": 1.0,
    "exploration_decay": 0.99,
    "exploration_floor": 0.1
  }
}
  • exploration_bonus (default 1.0) — initial exploration multiplier. Higher values mean more random early sampling.
  • exploration_decay (default 0.99) — per-interaction decay rate.
  • exploration_floor (default 0.1) — minimum exploration. The bandit never fully stops exploring — this prevents it from getting permanently stuck on suboptimal weights if preferences change.
After 100 interactions: 1.0 * 0.99^100 = 0.37. After 500: 1.0 * 0.99^500 = 0.007 (floored to 0.1). The system converges toward exploitation while maintaining a baseline level of exploration.

Multi-Signal Rewards

By default, learned fusion treats click as the only learning signal. The reward_map lets you assign different reward magnitudes to different interaction types:
{
  "learning_config": {
    "reward_map": {
      "click": 1.0,
      "purchase": 3.0,
      "add_to_cart": 2.0,
      "bookmark": 1.5,
      "positive_feedback": 2.0,
      "negative_feedback": -2.0,
      "dismiss": -1.0
    }
  }
}
Positive values increase the alpha parameter for the associated feature (making it more likely to be weighted higher). Negative values increase the beta parameter (penalizing the feature). A purchase at 3.0 shifts weights three times as much as a click at 1.0. Per-interaction rewards are also capped at max_reward_per_interaction (default 5.0) to prevent a single buggy or malicious interaction batch from dominating the learned weights. See the Reward Signals reference for all 14 supported interaction types and guidance on choosing reward values.