Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Mixpeek follows a two-tower architecture — a well-known pattern in recommendation systems adapted for multimodal search.
Two towers: document tower writes N representation spaces at ingest, query tower reads and fuses them at search time, interaction signals close the feedback loop

Document Tower (Ingestion)

Source files enter through a bucket, trigger one or more collections, and pass through the Ray engine for feature extraction. Each collection produces a different representation — text embeddings, multimodal embeddings, metadata, taxonomy labels — all stored as named vectors on a single Qdrant point. Documents are encoded once at ingest time. Adding a new extractor or updating a taxonomy triggers a re-process on the bucket — documents get new representations without changing the ingestion path.

Query Tower (Retrieval)

A query arrives, gets encoded, and passes through a multi-stage retriever. The key stage is feature search, which runs a separate vector query per embedding space and fuses the results. The fusion strategy determines how per-feature scores combine into a final ranking:
StrategyBehavior
rrfRank-based, no tuning needed
weightedManual weights you set
learnedWeights sampled from Beta distributions, updated by user behavior
See Fusion Strategies for the full comparison.

Closing the Loop

With learned fusion, the two towers aren’t static — they’re connected by a feedback loop:
  1. Results are shown to users
  2. Interactions (clicks, purchases, skips) are captured and stored in ClickHouse
  3. Thompson Sampling aggregates interactions into Beta(α, β) distributions per feature — α counts positive signals, β counts non-engagement
  4. Sampled weights are drawn from those distributions on each query, naturally balancing exploration and exploitation
  5. Weights converge toward the optimal blend as interactions accumulate
The system handles cold start through hierarchical fallback: personal weights → demographic segment → global → uniform prior. With zero interactions, learned fusion behaves identically to RRF.

What Makes This Different

Standard two-tower systems learn a single embedding space. Mixpeek’s document tower fans out into N representation spaces (visual, audio, text, multimodal, metadata), and the query tower traverses them in sequence through multi-stage retrieval. Learning happens at the fusion layer — which spaces to weight — not inside the embeddings themselves. This means you can add a new extractor, re-process your data, and the bandit will automatically discover whether the new feature improves results — without retraining any model.