Mixpeek layers several caches to deliver low-latency responses while guaranteeing consistency. Every layer relies on deterministic signatures so you never serve results from an outdated index.Documentation Index
Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Cache Layers
| Layer | Scope | Backing Store | TTL | Purpose |
|---|---|---|---|---|
| Retriever response | Full execution output | Redis | 1 hour (configurable) | Return entire execution payload instantly on repeated queries |
| Stage output | Individual stages (feature_search, rerank) | Redis | 1 hour (configurable per stage) | Reuse expensive stages across similar queries |
| Inference | Embeddings & rerankers | Redis | ~1 hour | Avoid recomputing identical model inferences |
| Document features | Stored vectors/payloads | MVS | Permanent | Reuse ingestion-time features for future queries |
How It Works
On cache hit at the retriever level, the entire pipeline is skipped — response includes acached_at timestamp so you can verify freshness. On cache hit at the stage level, only that stage is skipped and the rest of the pipeline continues.
Index Signatures
Each collection stores anindex_signature in MongoDB. The signature hashes:
- Collection configuration (feature extractor, passthrough fields)
- Document count and vector dimensions
- Timestamp of last ingestion event (with debounce logic)
index_signature, so whenever ingestion updates the collection the signature changes and cached query responses automatically miss.
Response Cache Metadata
Retriever execution responses include cache information:cached_at(top-level) — Unix timestamp when the full response was cached. Present only on retriever-level cache hits. Compute freshness:time.time() - cached_at.cache_hit(per stage) — Whether this stage’s result came from stage cache.cached_at(per stage) — Unix timestamp when this stage result was cached.
Bypassing Cache
Force a fresh execution withskip_cache:
Stage-Level Controls
Control caching per stage viacache_behavior and cache_ttl_seconds:
cache_behavior options:
auto(default) — Cache deterministic operations automaticallydisabled— Skip caching entirely for this stageaggressive— Cache even non-deterministic operations (use with caution)
Inference Cache
The Engine caches model calls using a hashed payload of(model_name, inputs, parameters). Use it to:
- Reuse embeddings for identical prompts or documents
- Skip recomputing reranking scores for popular queries
- Short-circuit repeated LLM-based filters with static criteria
Cache Invalidation
Caches are invalidated automatically on:| Event | Scope |
|---|---|
| Document ingestion completes | Collection-level (via index signature change) |
| Retriever deleted | All keys for that retriever |
| Collection deleted/updated | All keys for that collection |
| Namespace deleted | All keys in namespace |
Monitoring Cache Performance
- Use
GET /v1/analytics/retrievers/{id}/cache-performancefor hit/miss ratios and latency deltas. stage_statisticsinside retriever responses flagcache_hitper stage.- Redis namespaces per feature (e.g.,
cache:retriever:...) make it easy to inspect keys if needed.
Best Practices
- Caching is on by default with
cache_behavior: "auto"— no setup needed. - Use
skip_cache: truefor debugging or when you need guaranteed-fresh results. - Disable stage caching for stages with time-sensitive inputs (
now(),random()). - Use stage caching when reranking or feature search is the bottleneck.
- Leverage inference caching for expensive LLM or GPU workloads — even small hit rates pay off.

