> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Group By

> Aggregate documents by shared field values into logical groups

<Frame>
  <img src="https://mintcdn.com/mixpeek/TwtTrae3Fi3EFJ72/assets/retrievers/group-by.svg?fit=max&auto=format&n=TwtTrae3Fi3EFJ72&q=85&s=bd0f5eff607fb54559e4a282ee779403" alt="Group By stage showing document aggregation by field values" width="900" height="320" data-path="assets/retrievers/group-by.svg" />
</Frame>

The Group By stage aggregates documents that share the same value for a specified field, creating logical groups. This is useful for organizing results by category, author, date, or any other attribute.

<Note>
  **Stage Category**: GROUP (Groups documents)

  **Transformation**: N documents → G groups (where G = unique field values)
</Note>

## When to Use

| Use Case                | Description                |
| ----------------------- | -------------------------- |
| **Category grouping**   | Group products by category |
| **Author aggregation**  | Group articles by author   |
| **Date grouping**       | Group by day/month/year    |
| **Source organization** | Group by data source       |

## When NOT to Use

| Scenario                      | Recommended Alternative |
| ----------------------------- | ----------------------- |
| Semantic similarity grouping  | `cluster`               |
| Statistical aggregations only | `aggregate`             |
| Removing duplicates           | `deduplicate`           |
| Top-N per group               | Use with `sample`       |

## Parameters

| Parameter        | Type    | Default            | Description                                                         |
| ---------------- | ------- | ------------------ | ------------------------------------------------------------------- |
| `group_by_field` | string  | `source_object_id` | Field to group by (dot notation supported)                          |
| `max_per_group`  | integer | `10`               | Maximum documents to keep per group                                 |
| `output_mode`    | string  | `all`              | `first` (top doc per group), `all` (grouped), `flatten` (flat list) |

## Configuration Examples

<CodeGroup>
  ```json Basic Group By theme={null}
  {
    "stage_name": "group_by",
    "stage_type": "group",
    "config": {
      "stage_id": "group_by",
      "parameters": {
        "group_by_field": "metadata.category"
      }
    }
  }
  ```

  ```json Limited Docs Per Group theme={null}
  {
    "stage_name": "group_by",
    "stage_type": "group",
    "config": {
      "stage_id": "group_by",
      "parameters": {
        "group_by_field": "metadata.author",
        "max_per_group": 5
      }
    }
  }
  ```

  ```json Deduplicate (top doc per group) theme={null}
  {
    "stage_name": "group_by",
    "stage_type": "group",
    "config": {
      "stage_id": "group_by",
      "parameters": {
        "group_by_field": "metadata.brand",
        "max_per_group": 1,
        "output_mode": "first"
      }
    }
  }
  ```

  ```json Date Grouping theme={null}
  {
    "stage_name": "group_by",
    "stage_type": "group",
    "config": {
      "stage_id": "group_by",
      "parameters": {
        "group_by_field": "metadata.publish_date"
      }
    }
  }
  ```

  ```json Nested Field Grouping theme={null}
  {
    "stage_name": "group_by",
    "stage_type": "group",
    "config": {
      "stage_id": "group_by",
      "parameters": {
        "group_by_field": "metadata.source.type",
        "max_per_group": 3
      }
    }
  }
  ```
</CodeGroup>

## Output Schema

```json theme={null}
{
  "groups": [
    {
      "key": "electronics",
      "count": 25,
      "documents": [
        {
          "document_id": "doc_123",
          "content": "Latest smartphone review...",
          "score": 0.95,
          "metadata": {"category": "electronics", "price": 999}
        },
        {
          "document_id": "doc_456",
          "content": "Laptop comparison guide...",
          "score": 0.89,
          "metadata": {"category": "electronics", "price": 1299}
        }
      ]
    },
    {
      "key": "clothing",
      "count": 18,
      "documents": [...]
    }
  ],
  "metadata": {
    "total_groups": 5,
    "total_documents": 100,
    "field": "metadata.category"
  }
}
```

## Performance

| Metric          | Value     |
| --------------- | --------- |
| **Latency**     | 5-20ms    |
| **Memory**      | O(N)      |
| **Cost**        | Free      |
| **Scalability** | Efficient |

## Common Pipeline Patterns

### Search + Group by Category

```json theme={null}
[
  {
    "stage_name": "semantic_search",
    "stage_type": "filter",
    "config": {
      "stage_id": "feature_search",
      "parameters": {
        "searches": [
          { "feature_uri": "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1", "query": { "input_mode": "text", "value": "{{INPUT.query}}" }, "top_k": 100 }
        ],
        "final_top_k": 100
      }
    }
  },
  {
    "stage_name": "group_by",
    "stage_type": "group",
    "config": {
      "stage_id": "group_by",
      "parameters": {
        "group_by_field": "metadata.category",
        "max_per_group": 5
      }
    }
  }
]
```

### Grouped Results with Aggregations

```json theme={null}
[
  {
    "stage_name": "hybrid_search",
    "stage_type": "filter",
    "config": {
      "stage_id": "feature_search",
      "parameters": {
        "searches": [
          { "feature_uri": "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1", "query": { "input_mode": "text", "value": "{{INPUT.query}}" }, "top_k": 200 }
        ],
        "final_top_k": 200
      }
    }
  },
  {
    "stage_name": "group_by",
    "stage_type": "group",
    "config": {
      "stage_id": "group_by",
      "parameters": {
        "group_by_field": "metadata.brand",
        "max_per_group": 10
      }
    }
  },
  {
    "stage_name": "aggregate",
    "stage_type": "reduce",
    "config": {
      "stage_id": "aggregate",
      "parameters": {
        "aggregations": [
          {"type": "avg", "field": "metadata.price", "name": "avg_price"},
          {"type": "avg", "field": "metadata.rating", "name": "avg_rating"}
        ],
        "group_by": "metadata.brand"
      }
    }
  }
]
```

### Author-Grouped Search

```json theme={null}
[
  {
    "stage_name": "semantic_search",
    "stage_type": "filter",
    "config": {
      "stage_id": "feature_search",
      "parameters": {
        "searches": [
          { "feature_uri": "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1", "query": { "input_mode": "text", "value": "{{INPUT.query}}" }, "top_k": 100 }
        ],
        "final_top_k": 100
      }
    }
  },
  {
    "stage_name": "document_enrich",
    "stage_type": "enrich",
    "config": {
      "stage_id": "document_enrich",
      "parameters": {
        "target_collection_id": "authors",
        "target_field": "author_id",
        "source_field": "metadata.author_id",
        "output_field": "author"
      }
    }
  },
  {
    "stage_name": "group_by",
    "stage_type": "group",
    "config": {
      "stage_id": "group_by",
      "parameters": {
        "group_by_field": "author.name",
        "max_per_group": 3
      }
    }
  }
]
```

### Time-Based Grouping

```json theme={null}
[
  {
    "stage_name": "structured_filter",
    "stage_type": "filter",
    "config": {
      "stage_id": "attribute_filter",
      "parameters": {
        "conditions": {
          "field": "metadata.date",
          "operator": "gte",
          "value": "2024-01-01"
        }
      }
    }
  },
  {
    "stage_name": "code_execution",
    "stage_type": "apply",
    "config": {
      "stage_id": "code_execution",
      "parameters": {
        "code": "def transform(doc):\n    date = doc.get('metadata', {}).get('date', '')\n    doc['metadata']['month'] = date[:7]  # YYYY-MM\n    return doc"
      }
    }
  },
  {
    "stage_name": "group_by",
    "stage_type": "group",
    "config": {
      "stage_id": "group_by",
      "parameters": {
        "group_by_field": "metadata.month"
      }
    }
  }
]
```

## Document Sorting Within Groups

Documents within each group are automatically sorted by relevance `score` (highest first), then limited to `max_per_group`.

## Output Modes

| `output_mode`   | Description                                                    |
| --------------- | -------------------------------------------------------------- |
| `all` (default) | Return all documents (up to `max_per_group`) grouped by field  |
| `first`         | Return only the top-scoring document per group (deduplication) |
| `flatten`       | Return all documents as a flat list (drops group structure)    |

## Handling Missing Values

Documents missing the `group_by_field` value are grouped under a `null` key.

## Error Handling

| Error              | Behavior                           |
| ------------------ | ---------------------------------- |
| Missing field      | Documents grouped under "null" key |
| Empty results      | Return empty groups array          |
| Invalid field path | Stage fails                        |

## Group By vs Cluster

| Aspect         | Group By              | Cluster              |
| -------------- | --------------------- | -------------------- |
| Grouping basis | Field value           | Embedding similarity |
| Groups known   | Yes (field values)    | No (discovered)      |
| Speed          | Fast                  | Slower               |
| Use case       | Category organization | Theme discovery      |

## Related

* [Aggregate](/retrieval/stages/aggregate) - Statistical aggregations
* [Cluster](/retrieval/stages/cluster) - Semantic grouping
* [Sample](/retrieval/stages/sample) - Select from groups
* [Sort Attribute](/retrieval/stages/sort-attribute) - Simple sorting
