Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Mixpeek layers several caches to deliver low-latency responses while guaranteeing consistency. Every layer relies on deterministic signatures so you never serve results from an outdated index.

Cache Layers

LayerScopeBacking StoreTTLPurpose
Retriever responseFull execution outputRedis1 hour (configurable)Return entire execution payload instantly on repeated queries
Stage outputIndividual stages (feature_search, rerank)Redis1 hour (configurable per stage)Reuse expensive stages across similar queries
InferenceEmbeddings & rerankersRedis~1 hourAvoid recomputing identical model inferences
Document featuresStored vectors/payloadsMVSPermanentReuse ingestion-time features for future queries

How It Works

On cache hit at the retriever level, the entire pipeline is skipped — response includes a cached_at timestamp so you can verify freshness. On cache hit at the stage level, only that stage is skipped and the rest of the pipeline continues.

Index Signatures

Each collection stores an index_signature in MongoDB. The signature hashes:
  • Collection configuration (feature extractor, passthrough fields)
  • Document count and vector dimensions
  • Timestamp of last ingestion event (with debounce logic)
Retriever cache keys include index_signature, so whenever ingestion updates the collection the signature changes and cached query responses automatically miss.
cache:retriever:quickstart-search:
  hash(
    inputs,
    filters,
    pagination,
    collection_signature="xyz789"
  )

Response Cache Metadata

Retriever execution responses include cache information:
{
  "execution_id": "exec_abc123",
  "status": "completed",
  "cached_at": 1714150000.5,
  "documents": [...],
  "stage_statistics": {
    "stages": {
      "text_search": {
        "cache_hit": true,
        "cached_at": 1714150000.2,
        "duration_ms": 0.5
      },
      "rerank": {
        "cache_hit": false,
        "cached_at": null,
        "duration_ms": 45.3
      }
    }
  }
}
  • cached_at (top-level) — Unix timestamp when the full response was cached. Present only on retriever-level cache hits. Compute freshness: time.time() - cached_at.
  • cache_hit (per stage) — Whether this stage’s result came from stage cache.
  • cached_at (per stage) — Unix timestamp when this stage result was cached.

Bypassing Cache

Force a fresh execution with skip_cache:
curl -X POST "$MP_API_URL/v1/retrievers/<id>/execute" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -H 'Content-Type: application/json' \
  -d '{ "inputs": { "query": "smart speaker" }, "skip_cache": true }'

Stage-Level Controls

Control caching per stage via cache_behavior and cache_ttl_seconds:
{
  "stages": [
    {
      "stage_name": "text_search",
      "config": {
        "parameters": {
          "cache_behavior": "auto",
          "cache_ttl_seconds": 600
        }
      }
    },
    {
      "stage_name": "rerank",
      "config": {
        "parameters": {
          "cache_behavior": "disabled"
        }
      }
    }
  ]
}
cache_behavior options:
  • auto (default) — Cache deterministic operations automatically
  • disabled — Skip caching entirely for this stage
  • aggressive — Cache even non-deterministic operations (use with caution)

Inference Cache

The Engine caches model calls using a hashed payload of (model_name, inputs, parameters). Use it to:
  • Reuse embeddings for identical prompts or documents
  • Skip recomputing reranking scores for popular queries
  • Short-circuit repeated LLM-based filters with static criteria

Cache Invalidation

Caches are invalidated automatically on:
EventScope
Document ingestion completesCollection-level (via index signature change)
Retriever deletedAll keys for that retriever
Collection deleted/updatedAll keys for that collection
Namespace deletedAll keys in namespace
Manual invalidation is also available:
DELETE /v1/retrievers/{retriever_id}/cache

Monitoring Cache Performance

  • Use GET /v1/analytics/retrievers/{id}/cache-performance for hit/miss ratios and latency deltas.
  • stage_statistics inside retriever responses flag cache_hit per stage.
  • Redis namespaces per feature (e.g., cache:retriever:...) make it easy to inspect keys if needed.

Best Practices

  • Caching is on by default with cache_behavior: "auto" — no setup needed.
  • Use skip_cache: true for debugging or when you need guaranteed-fresh results.
  • Disable stage caching for stages with time-sensitive inputs (now(), random()).
  • Use stage caching when reranking or feature search is the bottleneck.
  • Leverage inference caching for expensive LLM or GPU workloads — even small hit rates pay off.