> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Code Execution

> Execute custom code to transform, filter, or enrich documents

<Frame>
  <img src="https://mintcdn.com/mixpeek/TwtTrae3Fi3EFJ72/assets/retrievers/code-execution.svg?fit=max&auto=format&n=TwtTrae3Fi3EFJ72&q=85&s=eb80f950593bfd772e987fcd72411f6a" alt="Code Execution stage showing custom code transformations" width="900" height="350" data-path="assets/retrievers/code-execution.svg" />
</Frame>

The Code Execution stage allows you to run custom Python code to transform, filter, or enrich documents. This provides maximum flexibility for complex logic that can't be expressed with other stages.

<Note>
  **Stage Category**: APPLY (Transforms documents)

  **Transformation**: N documents → M documents (custom logic)
</Note>

## When to Use

| Use Case                   | Description                     |
| -------------------------- | ------------------------------- |
| **Custom transformations** | Complex field calculations      |
| **Business logic**         | Domain-specific rules           |
| **Data normalization**     | Custom parsing/formatting       |
| **Advanced filtering**     | Logic beyond structured\_filter |

## When NOT to Use

| Scenario                | Recommended Alternative |
| ----------------------- | ----------------------- |
| Simple field transforms | `json_transform`        |
| LLM-based enrichment    | `llm_enrich`            |
| Standard filtering      | `attribute_filter`      |
| External API calls      | `api_call`              |

## Parameters

| Parameter         | Type    | Default    | Description                                                 |
| ----------------- | ------- | ---------- | ----------------------------------------------------------- |
| `code`            | string  | *Required* | Code to execute                                             |
| `language`        | string  | `python`   | Execution language (`python`, `typescript`, `javascript`)   |
| `output_field`    | string  | `computed` | Document field path where results are merged                |
| `result_variable` | string  | `result`   | Variable name containing the output list                    |
| `timeout_ms`      | integer | `5000`     | Execution timeout in milliseconds (100-30000)               |
| `max_output_size` | integer | `100000`   | Max output size in bytes (1024-1000000)                     |
| `env`             | object  | `{}`       | Environment variables (supports `INPUT`/`SECRET` templates) |
| `on_error`        | string  | `skip`     | Error handling strategy (`skip`, `raise`)                   |

## Configuration Examples

<CodeGroup>
  ```json Basic Transformation theme={null}
  {
    "stage_name": "code_execution",
    "stage_type": "apply",
    "config": {
      "stage_id": "code_execution",
      "parameters": {
        "code": "def transform(doc):\n    doc['word_count'] = len(doc.get('content', '').split())\n    return doc"
      }
    }
  }
  ```

  ```json Custom Scoring theme={null}
  {
    "stage_name": "code_execution",
    "stage_type": "apply",
    "config": {
      "stage_id": "code_execution",
      "parameters": {
        "code": "def transform(doc):\n    base_score = doc.get('score', 0)\n    recency_boost = 1.0 if doc.get('metadata', {}).get('is_recent') else 0.8\n    doc['adjusted_score'] = base_score * recency_boost\n    return doc"
      }
    }
  }
  ```

  ```json Filtering Logic theme={null}
  {
    "stage_name": "code_execution",
    "stage_type": "apply",
    "config": {
      "stage_id": "code_execution",
      "parameters": {
        "code": "def transform(doc):\n    content = doc.get('content', '')\n    # Filter out short or low-quality content\n    if len(content) < 100:\n        return None\n    if content.count('http') > 5:\n        return None  # Too many links\n    return doc"
      }
    }
  }
  ```

  ```json With External Packages theme={null}
  {
    "stage_name": "code_execution",
    "stage_type": "apply",
    "config": {
      "stage_id": "code_execution",
      "parameters": {
        "code": "import dateutil.parser\n\ndef transform(doc):\n    date_str = doc.get('metadata', {}).get('date')\n    if date_str:\n        parsed = dateutil.parser.parse(date_str)\n        doc['metadata']['year'] = parsed.year\n        doc['metadata']['month'] = parsed.month\n    return doc"
      }
    }
  }
  ```

  ```json Text Processing theme={null}
  {
    "stage_name": "code_execution",
    "stage_type": "apply",
    "config": {
      "stage_id": "code_execution",
      "parameters": {
        "code": "import re\n\ndef transform(doc):\n    content = doc.get('content', '')\n    # Extract emails\n    emails = re.findall(r'[\\w.-]+@[\\w.-]+', content)\n    doc['extracted_emails'] = emails\n    # Clean content\n    doc['clean_content'] = re.sub(r'\\s+', ' ', content).strip()\n    return doc"
      }
    }
  }
  ```
</CodeGroup>

## Code Structure

Your code must define a `transform` function:

```python theme={null}
def transform(doc):
    """
    Transform a single document.

    Args:
        doc: Dictionary containing document fields

    Returns:
        - Modified doc dict to keep document
        - None to filter out document
    """
    # Your logic here
    return doc
```

### Available in Scope

| Variable  | Type | Description               |
| --------- | ---- | ------------------------- |
| `doc`     | dict | Current document          |
| `INPUT`   | dict | Pipeline input parameters |
| `CONTEXT` | dict | Pipeline context          |

## Input Document Structure

```python theme={null}
doc = {
    "document_id": "doc_123",
    "content": "Document text content...",
    "score": 0.85,
    "metadata": {
        "title": "Document Title",
        "author": "John Doe",
        "date": "2024-01-15"
    }
}
```

## Output Options

| Return Value       | Effect                     |
| ------------------ | -------------------------- |
| `doc` (modified)   | Keep document with changes |
| `doc` (unmodified) | Keep document as-is        |
| `None`             | Filter out document        |

## Performance

| Metric          | Value                        |
| --------------- | ---------------------------- |
| **Latency**     | 5-50ms per document          |
| **Timeout**     | Configurable (default 5s)    |
| **Memory**      | Configurable (default 128MB) |
| **Concurrency** | Parallel execution           |

<Warning>
  Code execution adds latency. Keep transformations simple and avoid heavy computation. For complex processing, consider pre-computing during ingestion.
</Warning>

## Common Pipeline Patterns

### Custom Scoring Pipeline

```json theme={null}
[
  {
    "stage_name": "semantic_search",
    "stage_type": "filter",
    "config": {
      "stage_id": "feature_search",
      "parameters": {
        "searches": [
          { "feature_uri": "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1", "query": { "input_mode": "text", "value": "{{INPUT.query}}" }, "top_k": 50 }
        ],
        "final_top_k": 50
      }
    }
  },
  {
    "stage_name": "code_execution",
    "stage_type": "apply",
    "config": {
      "stage_id": "code_execution",
      "parameters": {
        "code": "def transform(doc):\n    score = doc.get('score', 0)\n    # Boost verified sources\n    if doc.get('metadata', {}).get('verified'):\n        score *= 1.2\n    # Penalize old content\n    if doc.get('metadata', {}).get('year', 2024) < 2020:\n        score *= 0.8\n    doc['custom_score'] = score\n    return doc"
      }
    }
  },
  {
    "stage_name": "sort_relevance",
    "stage_type": "sort",
    "config": {
      "stage_id": "sort_relevance",
      "parameters": {
        "score_field": "custom_score"
      }
    }
  }
]
```

### Data Normalization Pipeline

```json theme={null}
[
  {
    "stage_name": "hybrid_search",
    "stage_type": "filter",
    "config": {
      "stage_id": "feature_search",
      "parameters": {
        "searches": [
          { "feature_uri": "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1", "query": { "input_mode": "text", "value": "{{INPUT.query}}" }, "top_k": 30 }
        ],
        "final_top_k": 30
      }
    }
  },
  {
    "stage_name": "code_execution",
    "stage_type": "apply",
    "config": {
      "stage_id": "code_execution",
      "parameters": {
        "code": "def transform(doc):\n    meta = doc.get('metadata', {})\n    # Normalize price to USD\n    price = meta.get('price', 0)\n    currency = meta.get('currency', 'USD')\n    rates = {'EUR': 1.1, 'GBP': 1.27, 'USD': 1.0}\n    doc['metadata']['price_usd'] = price * rates.get(currency, 1.0)\n    return doc"
      }
    }
  }
]
```

### Advanced Filtering

```json theme={null}
[
  {
    "stage_name": "semantic_search",
    "stage_type": "filter",
    "config": {
      "stage_id": "feature_search",
      "parameters": {
        "searches": [
          { "feature_uri": "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1", "query": { "input_mode": "text", "value": "{{INPUT.query}}" }, "top_k": 100 }
        ],
        "final_top_k": 100
      }
    }
  },
  {
    "stage_name": "code_execution",
    "stage_type": "apply",
    "config": {
      "stage_id": "code_execution",
      "parameters": {
        "code": "def transform(doc):\n    content = doc.get('content', '')\n    # Complex filtering logic\n    word_count = len(content.split())\n    if word_count < 50:\n        return None\n    # Check for required sections\n    required = ['introduction', 'conclusion']\n    content_lower = content.lower()\n    if not all(section in content_lower for section in required):\n        return None\n    doc['metadata']['word_count'] = word_count\n    return doc"
      }
    }
  }
]
```

## Security

| Restriction | Description                    |
| ----------- | ------------------------------ |
| Network     | No outbound network access     |
| Filesystem  | No file system access          |
| Imports     | Limited to approved packages   |
| Resources   | Memory and CPU limits enforced |

## Allowed Packages

Built-in packages available:

* `json`, `re`, `math`, `datetime`, `collections`
* `itertools`, `functools`, `operator`

The sandbox provides a standard runtime environment for the selected `language`.

## Error Handling

| Error             | Behavior         |
| ----------------- | ---------------- |
| Syntax error      | Stage fails      |
| Runtime exception | Document skipped |
| Timeout           | Document skipped |
| Memory exceeded   | Stage fails      |

<Tip>
  Always handle missing fields gracefully using `.get()` with defaults to avoid runtime errors.
</Tip>

## Debugging

Enable debug mode to see execution details:

```json theme={null}
{
  "stage_name": "code_execution",
  "stage_type": "apply",
  "config": {
    "stage_id": "code_execution",
    "parameters": {
      "code": "def transform(doc):\n    print(f'Processing: {doc.get(\"document_id\")}')\n    return doc"
    }
  }
}
```

## Related

* [JSON Transform](/retrieval/stages/json-transform) - Template-based transforms
* [LLM Enrich](/retrieval/stages/llm-enrich) - AI-powered enrichment
* [Attribute Filter](/retrieval/stages/attribute-filter) - Standard filtering
