> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Caching & Signatures

> Keep retrieval fast without serving stale data

Mixpeek layers several caches to deliver low-latency responses while guaranteeing consistency. Every layer relies on deterministic signatures so you never serve results from an outdated index.

## Cache Layers

| Layer              | Scope                                          | Backing Store                  | TTL                             | Purpose                                                       |
| ------------------ | ---------------------------------------------- | ------------------------------ | ------------------------------- | ------------------------------------------------------------- |
| Retriever response | Full execution output                          | Redis                          | 1 hour (configurable)           | Return entire execution payload instantly on repeated queries |
| Stage output       | Individual stages (`feature_search`, `rerank`) | Redis                          | 1 hour (configurable per stage) | Reuse expensive stages across similar queries                 |
| Inference          | Embeddings & rerankers                         | Redis                          | \~1 hour                        | Avoid recomputing identical model inferences                  |
| Document features  | Stored vectors/payloads                        | [MVS](https://mixpeek.com/mvs) | Permanent                       | Reuse ingestion-time features for future queries              |

## How It Works

```mermaid theme={null}
graph LR
  Q[Query] --> RC{Retriever Cache?}
  RC -->|HIT| R1[Return Cached Response]
  RC -->|MISS| P[Execute Pipeline]
  P --> SC{Stage Cache?}
  SC -->|HIT| SK[Skip Stage]
  SC -->|MISS| EX[Run Stage]
  EX --> SS[Store Stage Result]
  SK --> N[Next Stage]
  SS --> N
  N --> ST[Store Full Response]
  ST --> R2[Return Fresh Response]
```

On cache hit at the **retriever level**, the entire pipeline is skipped — response includes a `cached_at` timestamp so you can verify freshness. On cache hit at the **stage level**, only that stage is skipped and the rest of the pipeline continues.

## Index Signatures

Each collection stores an `index_signature` in MongoDB. The signature hashes:

* Collection configuration (feature extractor, passthrough fields)
* Document count and vector dimensions
* Timestamp of last ingestion event (with debounce logic)

Retriever cache keys include `index_signature`, so whenever ingestion updates the collection the signature changes and cached query responses automatically miss.

```text theme={null}
cache:retriever:quickstart-search:
  hash(
    inputs,
    filters,
    pagination,
    collection_signature="xyz789"
  )
```

## Response Cache Metadata

Retriever execution responses include cache information:

```json theme={null}
{
  "execution_id": "exec_abc123",
  "status": "completed",
  "cached_at": 1714150000.5,
  "documents": [...],
  "stage_statistics": {
    "stages": {
      "text_search": {
        "cache_hit": true,
        "cached_at": 1714150000.2,
        "duration_ms": 0.5
      },
      "rerank": {
        "cache_hit": false,
        "cached_at": null,
        "duration_ms": 45.3
      }
    }
  }
}
```

* **`cached_at`** (top-level) — Unix timestamp when the full response was cached. Present only on retriever-level cache hits. Compute freshness: `time.time() - cached_at`.
* **`cache_hit`** (per stage) — Whether this stage's result came from stage cache.
* **`cached_at`** (per stage) — Unix timestamp when this stage result was cached.

## Bypassing Cache

Force a fresh execution with `skip_cache`:

```bash theme={null}
curl -X POST "$MP_API_URL/v1/retrievers/<id>/execute" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -H 'Content-Type: application/json' \
  -d '{ "inputs": { "query": "smart speaker" }, "skip_cache": true }'
```

## Stage-Level Controls

Control caching per stage via `cache_behavior` and `cache_ttl_seconds`:

```json theme={null}
{
  "stages": [
    {
      "stage_name": "text_search",
      "config": {
        "parameters": {
          "cache_behavior": "auto",
          "cache_ttl_seconds": 600
        }
      }
    },
    {
      "stage_name": "rerank",
      "config": {
        "parameters": {
          "cache_behavior": "disabled"
        }
      }
    }
  ]
}
```

**`cache_behavior` options:**

* `auto` (default) — Cache deterministic operations automatically
* `disabled` — Skip caching entirely for this stage
* `aggressive` — Cache even non-deterministic operations (use with caution)

## Inference Cache

The Engine caches model calls using a hashed payload of `(model_name, inputs, parameters)`. Use it to:

* Reuse embeddings for identical prompts or documents
* Skip recomputing reranking scores for popular queries
* Short-circuit repeated LLM-based filters with static criteria

## Cache Invalidation

Caches are invalidated automatically on:

| Event                        | Scope                                         |
| ---------------------------- | --------------------------------------------- |
| Document ingestion completes | Collection-level (via index signature change) |
| Retriever deleted            | All keys for that retriever                   |
| Collection deleted/updated   | All keys for that collection                  |
| Namespace deleted            | All keys in namespace                         |

Manual invalidation is also available:

```bash theme={null}
DELETE /v1/retrievers/{retriever_id}/cache
```

## Monitoring Cache Performance

* Use **`GET /v1/analytics/retrievers/{id}/cache-performance`** for hit/miss ratios and latency deltas.
* `stage_statistics` inside retriever responses flag `cache_hit` per stage.
* Redis namespaces per feature (e.g., `cache:retriever:...`) make it easy to inspect keys if needed.

## Best Practices

* Caching is on by default with `cache_behavior: "auto"` — no setup needed.
* Use `skip_cache: true` for debugging or when you need guaranteed-fresh results.
* Disable stage caching for stages with time-sensitive inputs (`now()`, `random()`).
* Use stage caching when reranking or feature search is the bottleneck.
* Leverage inference caching for expensive LLM or GPU workloads — even small hit rates pay off.
