Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

This tutorial walks through the full lifecycle: static weights → interaction capture → learned fusion → convergence monitoring. You don’t need interaction data to start — the system gracefully degrades to uniform weights with zero signal.

What You’ll Build

A search retriever that starts with manually tuned weights and progressively learns the optimal blend of features from user behavior. By the end, your retriever adapts per-user (or per-segment) without manual tuning. Prerequisites: A namespace with at least two collections producing different embedding types (e.g., text + multimodal). See Semantic Search or Video Understanding to set those up first.

1. Start with Static Weights

Begin with weighted fusion. This gives you a deterministic baseline to measure against later.
curl -X POST "$MP_API_URL/v1/retrievers" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -d '{
    "retriever_name": "product-search",
    "stages": [
      {
        "stage_type": "filter",
        "stage_id": "feature_search",
        "parameters": {
          "searches": [
            {
              "feature_uri": "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1",
              "query": "{{INPUT.query}}",
              "top_k": 100
            },
            {
              "feature_uri": "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding",
              "query": "{{INPUT.query}}",
              "top_k": 100
            }
          ],
          "fusion": "weighted",
          "weights": [0.6, 0.4],
          "final_top_k": 25
        }
      }
    ]
  }'
Run searches against this retriever and record the result quality. These static-weight results are your baseline.

2. Instrument Your Application with Interaction Signals

Before switching to learned fusion, you need to emit signals. Add interaction tracking wherever users engage with search results.
document.querySelectorAll('.search-result').forEach((el, index) => {
  el.addEventListener('click', () => {
    fetch(`${MP_API_URL}/v1/retrievers/interactions`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${MP_API_KEY}`,
        'X-Namespace': MP_NAMESPACE
      },
      body: JSON.stringify({
        feature_id: el.dataset.documentId,
        interaction_type: ['click'],
        position: index,
        metadata: { query: currentQuery },
        user_id: userId,
        session_id: sessionId
      })
    });
  });
});
Always include position. Without it, the system can’t correct for position bias — users click higher-ranked results regardless of relevance. A click at position 8 is a much stronger signal than a click at position 1.
Which signals to capture depends on your domain:
DomainPrimary SignalsWhy
E-commercepurchase, add_to_cart, clickConversion is the strongest relevance indicator
Media / Videolong_view, click, shareWatch completion > click for engagement
Enterprise Searchpositive_feedback, click, bookmarkClicks may be obligatory; explicit feedback is clearer
Content Matchingclick, positive_feedback, skipEditor accept/reject on matched content
See the Signal Strength Matrix for the full list of 13 signal types and how they’re weighted.

3. Switch to Learned Fusion

Once interactions are flowing, update the retriever to use learned fusion. You can do this at any time — even with zero interactions (it falls back to uniform weights).
curl -X PUT "$MP_API_URL/v1/retrievers/{retriever_id}" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -d '{
    "stages": [
      {
        "stage_type": "filter",
        "stage_id": "feature_search",
        "parameters": {
          "searches": [
            {
              "feature_uri": "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1",
              "query": "{{INPUT.query}}",
              "top_k": 100
            },
            {
              "feature_uri": "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding",
              "query": "{{INPUT.query}}",
              "top_k": 100
            }
          ],
          "fusion": "learned",
          "final_top_k": 25
        }
      }
    ]
  }'
No weights field needed — the system samples weights from Beta distributions on every query.

4. Understand Cold Start Behavior

With learned fusion enabled, the system handles sparse data automatically through hierarchical fallback:
InteractionsWhat HappensEffective Behavior
0Beta(1,1) = uniform prior for all featuresEquivalent to RRF
1–50Global weights only (aggregated across all users)One set of weights for everyone
50–500Demographic-level personalization beginsWeights vary by user segment
500+Per-user personalization kicks inEach user gets individually tuned weights
The threshold for trusting personal weights is 5 interactions per user (the min_interactions parameter). Below that, the system falls back up the hierarchy: personal → demographic → global → uniform prior.
If you pass user_id in your search requests, the system tracks personal-level weights automatically. Without user_id, you still get global-level learning — the system learns which features are better overall, just not per-user.

5. Execute Searches with User Context

Pass user_id on every search request so the bandit can build per-user weight profiles:
curl -X POST "$MP_API_URL/v1/retrievers/{retriever_id}/execute" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -d '{
    "query": {
      "query": "wireless noise canceling earbuds"
    },
    "user_id": "user_456"
  }'
Behind the scenes, the Thompson Sampler:
  1. Looks up user_456’s interaction history in ClickHouse
  2. Computes Beta(α, β) per feature: α = 1 + clicks, β = 1 + (impressions - clicks)
  3. Samples a weight from each Beta distribution
  4. Normalizes weights to sum to 1
  5. Executes each feature search and fuses results using the sampled weights
If user_456 has consistently clicked text-matched results over image-matched ones, the text feature’s Beta distribution is peaked higher — so sampled weights skew toward text.

6. Monitor Convergence

Check whether the learned weights are stabilizing using the analytics endpoint:
curl "$MP_API_URL/v1/analytics/retrievers/{retriever_id}/signals?signal_type=learned_weights&hours=168" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE"
What to look for:
  • Weights stabilizing — variance decreasing over time means the system is converging
  • Feature dominance — if one feature’s weight approaches 1.0, the other features may not be contributing value
  • Per-segment differences — different user segments learning different weights validates that personalization is working

7. Measure Improvement

Use evaluations to compare learned fusion against your static baseline. Create an evaluation with the same queries and judge whether learned fusion produces better-ranked results. The key metrics to track:
MetricWhat It Tells You
CTR at position 1-3Are top results more clickable?
Mean Reciprocal RankIs the first relevant result appearing earlier?
Interaction rateAre users engaging more overall?
Weight varianceIs the system still exploring or has it converged?
Recommended rollout: Run learned fusion on 10% of traffic alongside your static baseline. Compare metrics over 1-2 weeks. If learned fusion wins or ties, ramp to 100%.

When to Use Each Strategy

Starting PointRecommendation
No interaction data, launching todayStart with rrf — strong default, no tuning needed
Domain expert knows feature importanceStart with weighted — encode expert knowledge as initial weights
Have 100+ interactions flowingSwitch to learned — let the data decide
Multiple user segments with different needslearned with user_id — per-segment and per-user personalization
Need deterministic, reproducible resultsStay with weighted — learned fusion is stochastic by design

Next Steps

Learned Fusion Reference

Thompson Sampling algorithm details, configuration parameters, and hierarchical fallback mechanics.

Interaction Signals

Signal strategy — which signals to capture, strength matrix, and implementation patterns.

Fusion Strategies

Compare all 5 fusion strategies: RRF, DBSF, Weighted, Max, and Learned.

Evaluations

Set up benchmarks to measure whether learned fusion is improving result quality.