Mixpeek follows a two-tower architecture — a well-known pattern in recommendation systems adapted for multimodal search.Documentation Index
Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Document Tower (Ingestion)
Source files enter through a bucket, trigger one or more collections, and pass through the Ray engine for feature extraction. Each collection produces a different representation — text embeddings, multimodal embeddings, metadata, taxonomy labels — all stored as named vectors on a single Qdrant point. Documents are encoded once at ingest time. Adding a new extractor or updating a taxonomy triggers a re-process on the bucket — documents get new representations without changing the ingestion path.Query Tower (Retrieval)
A query arrives, gets encoded, and passes through a multi-stage retriever. The key stage is feature search, which runs a separate vector query per embedding space and fuses the results. The fusion strategy determines how per-feature scores combine into a final ranking:| Strategy | Behavior |
|---|---|
rrf | Rank-based, no tuning needed |
weighted | Manual weights you set |
learned | Weights sampled from Beta distributions, updated by user behavior |
Closing the Loop
With learned fusion, the two towers aren’t static — they’re connected by a feedback loop:- Results are shown to users
- Interactions (clicks, purchases, skips) are captured and stored in ClickHouse
- Thompson Sampling aggregates interactions into Beta(α, β) distributions per feature — α counts positive signals, β counts non-engagement
- Sampled weights are drawn from those distributions on each query, naturally balancing exploration and exploitation
- Weights converge toward the optimal blend as interactions accumulate
What Makes This Different
Standard two-tower systems learn a single embedding space. Mixpeek’s document tower fans out into N representation spaces (visual, audio, text, multimodal, metadata), and the query tower traverses them in sequence through multi-stage retrieval. Learning happens at the fusion layer — which spaces to weight — not inside the embeddings themselves. This means you can add a new extractor, re-process your data, and the bandit will automatically discover whether the new feature improves results — without retraining any model.Related
- Feedback Loop Tutorial — step-by-step setup guide
- Learned Fusion — Thompson Sampling algorithm details
- Interaction Signals — which signals to capture and when
- Fusion Strategies — all 5 strategies compared

