> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# External Web Search

> Augment results with real-time web search using Exa's neural search API

<Frame>
  <img src="https://mintcdn.com/mixpeek/TwtTrae3Fi3EFJ72/assets/retrievers/external-web-search.svg?fit=max&auto=format&n=TwtTrae3Fi3EFJ72&q=85&s=992fd46917a19a8e6cb0aa09d7cda089" alt="External Web Search stage showing Exa API integration for web results" width="1000" height="400" data-path="assets/retrievers/external-web-search.svg" />
</Frame>

The External Web Search stage integrates Exa's neural search API to augment your results with real-time web content. This enables hybrid retrieval combining your indexed documents with fresh web results.

<Note>
  **Stage Category**: APPLY (Enriches pipeline with web results)

  **Transformation**: N documents → N + M documents (web results added)
</Note>

## When to Use

| Use Case                     | Description                               |
| ---------------------------- | ----------------------------------------- |
| **Knowledge augmentation**   | Supplement internal docs with web content |
| **Real-time information**    | Access current events, news, updates      |
| **Research expansion**       | Broaden search beyond your corpus         |
| **Competitive intelligence** | Include competitor content in results     |

## When NOT to Use

| Scenario                       | Recommended Alternative   |
| ------------------------------ | ------------------------- |
| Internal-only search           | Skip this stage           |
| Sensitive/confidential queries | Use only indexed content  |
| Low-latency requirements       | Web search adds 200-500ms |

## Parameters

| Parameter              | Type    | Default           | Description                               |
| ---------------------- | ------- | ----------------- | ----------------------------------------- |
| `query`                | string  | `{{INPUT.query}}` | Search query (supports templates)         |
| `num_results`          | integer | `10`              | Number of web results to retrieve (1-100) |
| `start_published_date` | string  | `null`            | Filter by publish date (`YYYY-MM-DD`)     |
| `category`             | string  | `null`            | Content category filter                   |
| `use_autoprompt`       | boolean | `true`            | Let Exa optimize the query                |
| `include_text`         | boolean | `true`            | Include text snippets in results          |

## Available Categories

| Category         | Description                   |
| ---------------- | ----------------------------- |
| `company`        | Company websites and profiles |
| `research_paper` | Academic and research content |
| `news`           | News articles                 |
| `pdf`            | PDF documents                 |
| `github`         | GitHub repositories           |
| `tweet`          | Twitter/X content             |
| `personal_site`  | Personal websites and blogs   |

## Configuration Examples

<CodeGroup>
  ```json Basic Web Search theme={null}
  {
    "stage_name": "external_web_search",
    "stage_type": "apply",
    "config": {
      "stage_id": "external_web_search",
      "parameters": {
        "query": "{{INPUT.query}}",
        "num_results": 10
      }
    }
  }
  ```

  ```json Category-Filtered Search theme={null}
  {
    "stage_name": "external_web_search",
    "stage_type": "apply",
    "config": {
      "stage_id": "external_web_search",
      "parameters": {
        "query": "{{INPUT.query}}",
        "num_results": 5,
        "category": "research_paper"
      }
    }
  }
  ```

  ```json Recent News Search theme={null}
  {
    "stage_name": "external_web_search",
    "stage_type": "apply",
    "config": {
      "stage_id": "external_web_search",
      "parameters": {
        "query": "{{INPUT.query}}",
        "num_results": 10,
        "category": "news",
        "start_published_date": "2024-01-01"
      }
    }
  }
  ```

  ```json Research Papers theme={null}
  {
    "stage_name": "external_web_search",
    "stage_type": "apply",
    "config": {
      "stage_id": "external_web_search",
      "parameters": {
        "query": "{{INPUT.query}}",
        "num_results": 20,
        "category": "research_paper",
        "include_text": true
      }
    }
  }
  ```

  ```json GitHub Code Search theme={null}
  {
    "stage_name": "external_web_search",
    "stage_type": "apply",
    "config": {
      "stage_id": "external_web_search",
      "parameters": {
        "query": "{{INPUT.query}} implementation",
        "num_results": 10,
        "category": "github",
        "use_autoprompt": false
      }
    }
  }
  ```
</CodeGroup>

## Content Extraction

Set `include_text` to `true` (default) to include text snippets in each result. Disable it to reduce API costs and response size.

## Output Schema

Web results are added to the document set with a `source: "web"` marker:

```json theme={null}
{
  "document_id": "web_abc123",
  "source": "web",
  "url": "https://example.com/article",
  "title": "Article Title",
  "content": "Full extracted text content...",
  "published_date": "2024-03-15T10:30:00Z",
  "author": "John Doe",
  "score": 0.95,
  "metadata": {
    "domain": "example.com",
    "category": "news"
  }
}
```

## Exa Neural Search

Exa uses neural search rather than keyword matching:

| Feature                    | Description                           |
| -------------------------- | ------------------------------------- |
| **Semantic understanding** | Understands query intent              |
| **Neural ranking**         | ML-based relevance scoring            |
| **Content extraction**     | Automatic text extraction             |
| **Autoprompt**             | Query optimization for better results |

<Tip>
  Enable `use_autoprompt` (default) for natural language queries. Disable it when you need exact phrase matching or have already optimized your query.
</Tip>

## Performance

| Metric                 | Value                     |
| ---------------------- | ------------------------- |
| **Latency**            | 200-500ms                 |
| **Rate limits**        | Based on Exa plan         |
| **Parallel execution** | Concurrent with pipeline  |
| **Caching**            | Results cached for 1 hour |

## Common Pipeline Patterns

### Internal + Web Hybrid Search

```json theme={null}
[
  {
    "stage_name": "semantic_search",
    "stage_type": "filter",
    "config": {
      "stage_id": "feature_search",
      "parameters": {
        "searches": [
          { "feature_uri": "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1", "query": { "input_mode": "text", "value": "{{INPUT.query}}" }, "top_k": 20 }
        ],
        "final_top_k": 20
      }
    }
  },
  {
    "stage_name": "external_web_search",
    "stage_type": "apply",
    "config": {
      "stage_id": "external_web_search",
      "parameters": {
        "query": "{{INPUT.query}}",
        "num_results": 10
      }
    }
  },
  {
    "stage_name": "rerank",
    "stage_type": "sort",
    "config": {
      "stage_id": "rerank",
      "parameters": {
        "inference_name": "BAAI__bge_reranker_v2_m3",
        "top_k": 10
      }
    }
  }
]
```

### Web-Augmented RAG

```json theme={null}
[
  {
    "stage_name": "hybrid_search",
    "stage_type": "filter",
    "config": {
      "stage_id": "feature_search",
      "parameters": {
        "searches": [
          { "feature_uri": "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1", "query": { "input_mode": "text", "value": "{{INPUT.query}}" }, "top_k": 30 }
        ],
        "final_top_k": 30
      }
    }
  },
  {
    "stage_name": "external_web_search",
    "stage_type": "apply",
    "config": {
      "stage_id": "external_web_search",
      "parameters": {
        "query": "{{INPUT.query}}",
        "num_results": 5,
        "category": "news",
        "start_published_date": "{{INPUT.date_filter}}"
      }
    }
  },
  {
    "stage_name": "rag_prepare",
    "stage_type": "apply",
    "config": {
      "stage_id": "rag_prepare",
      "parameters": {
        "max_tokens": 8000,
        "output_mode": "single_context"
      }
    }
  }
]
```

## Error Handling

| Error            | Behavior                               |
| ---------------- | -------------------------------------- |
| API rate limit   | Retry with backoff                     |
| Network timeout  | Stage fails gracefully, no web results |
| Invalid domain   | Ignored, other domains searched        |
| No results found | Empty web result set                   |

<Warning>
  Web search results may include content from untrusted sources. Consider filtering or validating web content before using in sensitive applications.
</Warning>

## Related

* [Web Scrape](/retrieval/stages/web-scrape) - Extract content from specific URLs
* [Feature Search](/retrieval/stages/feature-search) - Search your indexed content
* [Rerank](/retrieval/stages/rerank) - Combine and rank mixed results
