> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Aggregate

> Compute statistical aggregations and metrics across document results

<Frame>
  <img src="https://mintcdn.com/mixpeek/TwtTrae3Fi3EFJ72/assets/retrievers/aggregate.svg?fit=max&auto=format&n=TwtTrae3Fi3EFJ72&q=85&s=13e879d5297474a4835667f5ff221382" alt="Aggregate stage showing statistical computations on search results" width="900" height="320" data-path="assets/retrievers/aggregate.svg" />
</Frame>

The Aggregate stage computes statistical aggregations across your search results, including counts, sums, averages, min/max values, and custom metrics. This is useful for analytics, faceted search, and understanding result distributions.

<Note>
  **Stage Category**: REDUCE (Aggregates results)

  **Transformation**: N documents → aggregation results + optional documents
</Note>

<Info>
  **Aggregating a whole collection?** The collection-level endpoint
  `POST /v1/collections/{collection_id}/aggregate` runs the same functions over
  every matching document (not just the current pipeline's results) and returns
  **exact** counts at any collection size — `count`, `count_distinct`, `sum`,
  `avg`, `min`, and `max` stream without a row cap. Pair it with the `is_null`
  filter operator for field-presence / data-validity checks, e.g. count how many
  documents are missing an ancestor (`from_collection`) field across 180k+ assets.
</Info>

## When to Use

| Use Case           | Description                       |
| ------------------ | --------------------------------- |
| **Faceted search** | Count documents by category       |
| **Analytics**      | Compute metrics across results    |
| **Price ranges**   | Min/max/avg calculations          |
| **Distributions**  | Understand result characteristics |

## When NOT to Use

| Scenario                | Recommended Alternative |
| ----------------------- | ----------------------- |
| Grouping with full docs | `group_by`              |
| Simple counting         | Use search facets       |
| Per-document stats      | `code_execution`        |
| LLM-based analysis      | `summarize`             |

## Parameters

| Parameter           | Type    | Default    | Description                          |
| ------------------- | ------- | ---------- | ------------------------------------ |
| `aggregations`      | array   | *Required* | List of aggregation operations       |
| `group_by`          | string  | *none*     | Field to group aggregations by       |
| `include_documents` | boolean | `false`    | Include original documents in output |

## Aggregation Types

| Type            | Description                       | Example                |
| --------------- | --------------------------------- | ---------------------- |
| `count`         | Number of documents               | Total results          |
| `sum`           | Sum of field values               | Total revenue          |
| `avg`           | Average value                     | Average price          |
| `min`           | Minimum value                     | Lowest price           |
| `max`           | Maximum value                     | Highest price          |
| `cardinality`   | Unique values count               | Unique authors         |
| `percentile`    | Percentile values                 | P50, P95               |
| `histogram`     | Value distribution                | Price buckets          |
| `stddev`        | Standard deviation (sample)       | Score spread           |
| `variance`      | Variance (sample)                 | Price volatility       |
| `frequency`     | Value frequency distribution      | Top categories         |
| `co_occurrence` | Co-occurrence of two fields       | Brand + category pairs |
| `correlation`   | Pearson correlation of two fields | Price vs. rating       |

## Configuration Examples

<CodeGroup>
  ```json Basic Aggregations theme={null}
  {
    "stage_name": "aggregate",
    "stage_type": "reduce",
    "config": {
      "stage_id": "aggregate",
      "parameters": {
        "aggregations": [
          {"type": "count", "name": "total"},
          {"type": "avg", "field": "metadata.price", "name": "avg_price"},
          {"type": "min", "field": "metadata.price", "name": "min_price"},
          {"type": "max", "field": "metadata.price", "name": "max_price"}
        ]
      }
    }
  }
  ```

  ```json Grouped Aggregations theme={null}
  {
    "stage_name": "aggregate",
    "stage_type": "reduce",
    "config": {
      "stage_id": "aggregate",
      "parameters": {
        "aggregations": [
          {"type": "count", "name": "count"},
          {"type": "avg", "field": "metadata.rating", "name": "avg_rating"}
        ],
        "group_by": "metadata.category"
      }
    }
  }
  ```

  ```json Faceted Search theme={null}
  {
    "stage_name": "aggregate",
    "stage_type": "reduce",
    "config": {
      "stage_id": "aggregate",
      "parameters": {
        "aggregations": [
          {"type": "count", "name": "total"},
          {"type": "cardinality", "field": "metadata.brand", "name": "brand_count"},
          {"type": "histogram", "field": "metadata.price", "interval": 50, "name": "price_ranges"}
        ],
        "include_documents": true
      }
    }
  }
  ```

  ```json Percentile Analysis theme={null}
  {
    "stage_name": "aggregate",
    "stage_type": "reduce",
    "config": {
      "stage_id": "aggregate",
      "parameters": {
        "aggregations": [
          {"type": "percentile", "field": "score", "percentiles": [25, 50, 75, 95], "name": "score_distribution"},
          {"type": "avg", "field": "score", "name": "avg_score"}
        ]
      }
    }
  }
  ```

  ```json Multi-Field Aggregations theme={null}
  {
    "stage_name": "aggregate",
    "stage_type": "reduce",
    "config": {
      "stage_id": "aggregate",
      "parameters": {
        "aggregations": [
          {"type": "count", "name": "total_docs"},
          {"type": "sum", "field": "metadata.views", "name": "total_views"},
          {"type": "avg", "field": "metadata.engagement_rate", "name": "avg_engagement"},
          {"type": "cardinality", "field": "metadata.author_id", "name": "unique_authors"}
        ]
      }
    }
  }
  ```
</CodeGroup>

## Output Schema

### Without Group By

```json theme={null}
{
  "aggregations": {
    "total": 150,
    "avg_price": 49.99,
    "min_price": 9.99,
    "max_price": 199.99
  },
  "documents": []  // if include_documents: false
}
```

### With Group By

```json theme={null}
{
  "aggregations": {
    "electronics": {
      "count": 45,
      "avg_rating": 4.2
    },
    "clothing": {
      "count": 62,
      "avg_rating": 4.5
    },
    "books": {
      "count": 43,
      "avg_rating": 4.7
    }
  }
}
```

### Histogram Output

```json theme={null}
{
  "aggregations": {
    "price_ranges": {
      "buckets": [
        {"key": "0-50", "count": 45},
        {"key": "50-100", "count": 62},
        {"key": "100-150", "count": 28},
        {"key": "150-200", "count": 15}
      ]
    }
  }
}
```

## Performance

| Metric          | Value                           |
| --------------- | ------------------------------- |
| **Latency**     | 5-50ms                          |
| **Memory**      | O(groups × aggregations)        |
| **Cost**        | Free                            |
| **Scalability** | Efficient for large result sets |

## Common Pipeline Patterns

### Search with Facets

```json theme={null}
[
  {
    "stage_name": "semantic_search",
    "stage_type": "filter",
    "config": {
      "stage_id": "feature_search",
      "parameters": {
        "searches": [
          { "feature_uri": "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1", "query": { "input_mode": "text", "value": "{{INPUT.query}}" }, "top_k": 1000 }
        ],
        "final_top_k": 1000
      }
    }
  },
  {
    "stage_name": "aggregate",
    "stage_type": "reduce",
    "config": {
      "stage_id": "aggregate",
      "parameters": {
        "aggregations": [
          {"type": "count", "name": "total"},
          {"type": "histogram", "field": "metadata.category", "name": "categories"},
          {"type": "histogram", "field": "metadata.price", "interval": 25, "name": "price_ranges"}
        ],
        "include_documents": true
      }
    }
  }
]
```

### Analytics Pipeline

```json theme={null}
[
  {
    "stage_name": "structured_filter",
    "stage_type": "filter",
    "config": {
      "stage_id": "attribute_filter",
      "parameters": {
        "conditions": {
          "AND": [
            {"field": "metadata.date", "operator": "gte", "value": "2024-01-01"},
            {"field": "metadata.status", "operator": "eq", "value": "published"}
          ]
        }
      }
    }
  },
  {
    "stage_name": "aggregate",
    "stage_type": "reduce",
    "config": {
      "stage_id": "aggregate",
      "parameters": {
        "aggregations": [
          {"type": "count", "name": "total_published"},
          {"type": "sum", "field": "metadata.views", "name": "total_views"},
          {"type": "avg", "field": "metadata.engagement", "name": "avg_engagement"},
          {"type": "percentile", "field": "metadata.views", "percentiles": [50, 90, 99], "name": "view_distribution"}
        ],
        "group_by": "metadata.author"
      }
    }
  }
]
```

### E-Commerce Product Analytics

```json theme={null}
[
  {
    "stage_name": "hybrid_search",
    "stage_type": "filter",
    "config": {
      "stage_id": "feature_search",
      "parameters": {
        "searches": [
          { "feature_uri": "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1", "query": { "input_mode": "text", "value": "{{INPUT.query}}" }, "top_k": 500 }
        ],
        "final_top_k": 500
      }
    }
  },
  {
    "stage_name": "structured_filter",
    "stage_type": "filter",
    "config": {
      "stage_id": "attribute_filter",
      "parameters": {
        "conditions": {
          "field": "metadata.in_stock",
          "operator": "eq",
          "value": true
        }
      }
    }
  },
  {
    "stage_name": "aggregate",
    "stage_type": "reduce",
    "config": {
      "stage_id": "aggregate",
      "parameters": {
        "aggregations": [
          {"type": "count", "name": "available_products"},
          {"type": "min", "field": "metadata.price", "name": "lowest_price"},
          {"type": "max", "field": "metadata.price", "name": "highest_price"},
          {"type": "avg", "field": "metadata.rating", "name": "avg_rating"},
          {"type": "cardinality", "field": "metadata.brand", "name": "brand_count"}
        ],
        "group_by": "metadata.category",
        "include_documents": true
      }
    }
  }
]
```

## Aggregation Details

### Count

```json theme={null}
{"type": "count", "name": "total"}
```

Counts documents. No field required.

### Sum / Avg / Min / Max

```json theme={null}
{"type": "avg", "field": "metadata.price", "name": "average_price"}
```

Requires numeric field.

### Cardinality

```json theme={null}
{"type": "cardinality", "field": "metadata.author", "name": "unique_authors"}
```

Counts unique values (approximate for large sets).

### Percentile

```json theme={null}
{"type": "percentile", "field": "score", "percentiles": [25, 50, 75, 95], "name": "score_percentiles"}
```

Returns specified percentile values.

### Histogram

```json theme={null}
{"type": "histogram", "field": "metadata.price", "interval": 50, "name": "price_buckets"}
```

Groups values into buckets.

## Statistical Aggregation Examples

<CodeGroup>
  ```json Percentile Analysis theme={null}
  {
    "stage_name": "aggregate",
    "stage_type": "reduce",
    "config": {
      "stage_id": "aggregate",
      "parameters": {
        "aggregations": [
          {"function": "percentile", "field": "score", "alias": "median_score", "percentile_value": 50},
          {"function": "percentile", "field": "score", "alias": "p90_score", "percentile_value": 90},
          {"function": "stddev", "field": "score", "alias": "score_spread"},
          {"function": "variance", "field": "score", "alias": "score_variance"}
        ]
      }
    }
  }
  ```

  ```json Frequency Distribution theme={null}
  {
    "stage_name": "aggregate",
    "stage_type": "reduce",
    "config": {
      "stage_id": "aggregate",
      "parameters": {
        "aggregations": [
          {"function": "frequency", "field": "category", "alias": "top_categories", "top_k": 10},
          {"function": "frequency", "field": "brand", "alias": "top_brands", "top_k": 5}
        ]
      }
    }
  }
  ```

  ```json Co-occurrence Analysis theme={null}
  {
    "stage_name": "aggregate",
    "stage_type": "reduce",
    "config": {
      "stage_id": "aggregate",
      "parameters": {
        "aggregations": [
          {"function": "co_occurrence", "field": "brand", "field_b": "category", "alias": "brand_category_pairs"},
          {"function": "correlation", "field": "price", "field_b": "rating", "alias": "price_rating_corr"}
        ]
      }
    }
  }
  ```
</CodeGroup>

### Statistical Output Examples

**Frequency** returns value counts with percentages:

```json theme={null}
{
  "top_categories": [
    {"value": "electronics", "count": 45, "percent": 30.0},
    {"value": "clothing", "count": 38, "percent": 25.3},
    {"value": "books", "count": 22, "percent": 14.7}
  ]
}
```

**Co-occurrence** returns field pair counts:

```json theme={null}
{
  "brand_category_pairs": [
    {"field_a": "Nike", "field_b": "footwear", "count": 12, "percent": 8.0},
    {"field_a": "Nike", "field_b": "apparel", "count": 8, "percent": 5.3}
  ]
}
```

**Correlation** returns the Pearson coefficient (-1 to 1):

```json theme={null}
{
  "price_rating_corr": 0.342156
}
```

## Error Handling

| Error                                    | Behavior                           |
| ---------------------------------------- | ---------------------------------- |
| Missing field                            | Skip document for that aggregation |
| Non-numeric field                        | Error for numeric aggregations     |
| Empty results                            | Return zero/empty aggregations     |
| Invalid type                             | Stage fails                        |
| Insufficient data for stddev/correlation | Returns null (need 2+ values)      |

## Related

* [Group By](/retrieval/stages/group-by) - Group with full documents
* [Sample](/retrieval/stages/sample) - Statistical sampling
* [Summarize](/retrieval/stages/summarize) - LLM-powered analysis
