Caching & Signatures

Mixpeek layers several caches to deliver low-latency responses while guaranteeing consistency. Every layer relies on deterministic signatures so you never serve results from an outdated index.

Cache Layers

Layer	Scope	Backing Store	TTL	Purpose
Retriever response	Full execution output	Redis	1 hour (configurable)	Return entire execution payload instantly on repeated queries
Stage output	Individual stages (`feature_search`, `rerank`)	Redis	1 hour (configurable per stage)	Reuse expensive stages across similar queries
Inference	Embeddings & rerankers	Redis	~1 hour	Avoid recomputing identical model inferences
Document features	Stored vectors/payloads	MVS	Permanent	Reuse ingestion-time features for future queries

How It Works

On cache hit at the retriever level, the entire pipeline is skipped — response includes a cached_at timestamp so you can verify freshness. On cache hit at the stage level, only that stage is skipped and the rest of the pipeline continues.

Index Signatures

Each collection stores an index_signature in MongoDB. The signature hashes:

Collection configuration (feature extractor, passthrough fields)
Document count and vector dimensions
Timestamp of last ingestion event (with debounce logic)

Retriever cache keys include index_signature, so whenever ingestion updates the collection the signature changes and cached query responses automatically miss.

cache:retriever:quickstart-search:
  hash(
    inputs,
    filters,
    pagination,
    collection_signature="xyz789"
  )

Response Cache Metadata

Retriever execution responses include cache information:

{
  "execution_id": "exec_abc123",
  "status": "completed",
  "cached_at": 1714150000.5,
  "documents": [...],
  "stage_statistics": {
    "stages": {
      "text_search": {
        "cache_hit": true,
        "cached_at": 1714150000.2,
        "duration_ms": 0.5
      },
      "rerank": {
        "cache_hit": false,
        "cached_at": null,
        "duration_ms": 45.3
      }
    }
  }
}

cached_at (top-level) — Unix timestamp when the full response was cached. Present only on retriever-level cache hits. Compute freshness: time.time() - cached_at.
cache_hit (per stage) — Whether this stage’s result came from stage cache.
cached_at (per stage) — Unix timestamp when this stage result was cached.

Bypassing Cache

Force a fresh execution with skip_cache:

curl -X POST "$MP_API_URL/v1/retrievers/<id>/execute" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -H 'Content-Type: application/json' \
  -d '{ "inputs": { "query": "smart speaker" }, "skip_cache": true }'

Stage-Level Controls

Control caching per stage via cache_behavior and cache_ttl_seconds:

{
  "stages": [
    {
      "stage_name": "text_search",
      "config": {
        "parameters": {
          "cache_behavior": "auto",
          "cache_ttl_seconds": 600
        }
      }
    },
    {
      "stage_name": "rerank",
      "config": {
        "parameters": {
          "cache_behavior": "disabled"
        }
      }
    }
  ]
}

cache_behavior options:

auto (default) — Cache deterministic operations automatically
disabled — Skip caching entirely for this stage
aggressive — Cache even non-deterministic operations (use with caution)

Inference Cache

The Engine caches model calls using a hashed payload of (model_name, inputs, parameters). Use it to:

Reuse embeddings for identical prompts or documents
Skip recomputing reranking scores for popular queries
Short-circuit repeated LLM-based filters with static criteria

Cache Invalidation

Caches are invalidated automatically on:

Event	Scope
Document ingestion completes	Collection-level (via index signature change)
Retriever deleted	All keys for that retriever
Collection deleted/updated	All keys for that collection
Namespace deleted	All keys in namespace

Manual invalidation is also available:

DELETE /v1/retrievers/{retriever_id}/cache

Monitoring Cache Performance

Use GET /v1/analytics/retrievers/{id}/cache-performance for hit/miss ratios and latency deltas.
stage_statistics inside retriever responses flag cache_hit per stage.
Redis namespaces per feature (e.g., cache:retriever:...) make it easy to inspect keys if needed.

Best Practices

Caching is on by default with cache_behavior: "auto" — no setup needed.
Use skip_cache: true for debugging or when you need guaranteed-fresh results.
Disable stage caching for stages with time-sensitive inputs (now(), random()).
Use stage caching when reranking or feature search is the bottleneck.
Leverage inference caching for expensive LLM or GPU workloads — even small hit rates pay off.

Get Started

What Mixpeek Extracts

Retrieval

Platform

Vector Store

Resources

Caching & Signatures

Cache Layers

How It Works

Index Signatures

Response Cache Metadata

Bypassing Cache

Stage-Level Controls

Inference Cache

Cache Invalidation

Monitoring Cache Performance

Best Practices

Get Started

What Mixpeek Extracts

Retrieval

Platform

Vector Store

Resources

Documentation Index

​Cache Layers

​How It Works

​Index Signatures

​Response Cache Metadata

​Bypassing Cache

​Stage-Level Controls

​Inference Cache

​Cache Invalidation

​Monitoring Cache Performance

​Best Practices

Cache Layers

How It Works

Index Signatures

Response Cache Metadata

Bypassing Cache

Stage-Level Controls

Inference Cache

Cache Invalidation

Monitoring Cache Performance

Best Practices