Skip to main content
External Web Search stage showing Exa API integration for web results
The External Web Search stage integrates Exa’s neural search API to augment your results with real-time web content. This enables hybrid retrieval combining your indexed documents with fresh web results.
Stage Category: APPLY (Enriches pipeline with web results)Transformation: N documents → N + M documents (web results added)

When to Use

Use CaseDescription
Knowledge augmentationSupplement internal docs with web content
Real-time informationAccess current events, news, updates
Research expansionBroaden search beyond your corpus
Competitive intelligenceInclude competitor content in results

When NOT to Use

ScenarioRecommended Alternative
Internal-only searchSkip this stage
Sensitive/confidential queriesUse only indexed content
Low-latency requirementsWeb search adds 200-500ms

Parameters

ParameterTypeDefaultDescription
querystring{{INPUT.query}}Search query (supports templates)
num_resultsinteger10Number of web results to retrieve (1-100)
start_published_datestringnullFilter by publish date (YYYY-MM-DD)
categorystringnullContent category filter
use_autopromptbooleantrueLet Exa optimize the query
include_textbooleantrueInclude text snippets in results

Available Categories

CategoryDescription
companyCompany websites and profiles
research_paperAcademic and research content
newsNews articles
pdfPDF documents
githubGitHub repositories
tweetTwitter/X content
personal_sitePersonal websites and blogs

Configuration Examples

{
  "stage_name": "external_web_search",
  "stage_type": "apply",
  "config": {
    "stage_id": "external_web_search",
    "parameters": {
      "query": "{{INPUT.query}}",
      "num_results": 10
    }
  }
}

Content Extraction

Set include_text to true (default) to include text snippets in each result. Disable it to reduce API costs and response size.

Output Schema

Web results are added to the document set with a source: "web" marker:
{
  "document_id": "web_abc123",
  "source": "web",
  "url": "https://example.com/article",
  "title": "Article Title",
  "content": "Full extracted text content...",
  "published_date": "2024-03-15T10:30:00Z",
  "author": "John Doe",
  "score": 0.95,
  "metadata": {
    "domain": "example.com",
    "category": "news"
  }
}
Exa uses neural search rather than keyword matching:
FeatureDescription
Semantic understandingUnderstands query intent
Neural rankingML-based relevance scoring
Content extractionAutomatic text extraction
AutopromptQuery optimization for better results
Enable use_autoprompt (default) for natural language queries. Disable it when you need exact phrase matching or have already optimized your query.

Performance

MetricValue
Latency200-500ms
Rate limitsBased on Exa plan
Parallel executionConcurrent with pipeline
CachingResults cached for 1 hour

Common Pipeline Patterns

[
  {
    "stage_name": "semantic_search",
    "stage_type": "filter",
    "config": {
      "stage_id": "feature_search",
      "parameters": {
        "searches": [
          { "feature_uri": "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1", "query": { "input_mode": "text", "value": "{{INPUT.query}}" }, "top_k": 20 }
        ],
        "final_top_k": 20
      }
    }
  },
  {
    "stage_name": "external_web_search",
    "stage_type": "apply",
    "config": {
      "stage_id": "external_web_search",
      "parameters": {
        "query": "{{INPUT.query}}",
        "num_results": 10
      }
    }
  },
  {
    "stage_name": "rerank",
    "stage_type": "sort",
    "config": {
      "stage_id": "rerank",
      "parameters": {
        "inference_name": "BAAI__bge_reranker_v2_m3",
        "top_k": 10
      }
    }
  }
]

Web-Augmented RAG

[
  {
    "stage_name": "hybrid_search",
    "stage_type": "filter",
    "config": {
      "stage_id": "feature_search",
      "parameters": {
        "searches": [
          { "feature_uri": "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1", "query": { "input_mode": "text", "value": "{{INPUT.query}}" }, "top_k": 30 }
        ],
        "final_top_k": 30
      }
    }
  },
  {
    "stage_name": "external_web_search",
    "stage_type": "apply",
    "config": {
      "stage_id": "external_web_search",
      "parameters": {
        "query": "{{INPUT.query}}",
        "num_results": 5,
        "category": "news",
        "start_published_date": "{{INPUT.date_filter}}"
      }
    }
  },
  {
    "stage_name": "rag_prepare",
    "stage_type": "apply",
    "config": {
      "stage_id": "rag_prepare",
      "parameters": {
        "max_tokens": 8000,
        "output_mode": "single_context"
      }
    }
  }
]

Error Handling

ErrorBehavior
API rate limitRetry with backoff
Network timeoutStage fails gracefully, no web results
Invalid domainIgnored, other domains searched
No results foundEmpty web result set
Web search results may include content from untrusted sources. Consider filtering or validating web content before using in sensitive applications.