The External Web Search stage integrates Exa’s neural search API to augment your results with real-time web content. This enables hybrid retrieval combining your indexed documents with fresh web results.
Stage Category : APPLY (Enriches pipeline with web results)Transformation : N documents → N + M documents (web results added)
When to Use
Use Case Description Knowledge augmentation Supplement internal docs with web content Real-time information Access current events, news, updates Research expansion Broaden search beyond your corpus Competitive intelligence Include competitor content in results
When NOT to Use
Scenario Recommended Alternative Internal-only search Skip this stage Sensitive/confidential queries Use only indexed content Low-latency requirements Web search adds 200-500ms
Parameters
Parameter Type Default Description querystring {{INPUT.query}}Search query (supports templates) num_resultsinteger 10Number of web results to retrieve (1-100) start_published_datestring nullFilter by publish date (YYYY-MM-DD) categorystring nullContent category filter use_autopromptboolean trueLet Exa optimize the query include_textboolean trueInclude text snippets in results
Available Categories
Category Description companyCompany websites and profiles research_paperAcademic and research content newsNews articles pdfPDF documents githubGitHub repositories tweetTwitter/X content personal_sitePersonal websites and blogs
Configuration Examples
Basic Web Search
Category-Filtered Search
Recent News Search
Research Papers
GitHub Code Search
{
"stage_name" : "external_web_search" ,
"stage_type" : "apply" ,
"config" : {
"stage_id" : "external_web_search" ,
"parameters" : {
"query" : "{{INPUT.query}}" ,
"num_results" : 10
}
}
}
Set include_text to true (default) to include text snippets in each result. Disable it to reduce API costs and response size.
Output Schema
Web results are added to the document set with a source: "web" marker:
{
"document_id" : "web_abc123" ,
"source" : "web" ,
"url" : "https://example.com/article" ,
"title" : "Article Title" ,
"content" : "Full extracted text content..." ,
"published_date" : "2024-03-15T10:30:00Z" ,
"author" : "John Doe" ,
"score" : 0.95 ,
"metadata" : {
"domain" : "example.com" ,
"category" : "news"
}
}
Exa Neural Search
Exa uses neural search rather than keyword matching:
Feature Description Semantic understanding Understands query intent Neural ranking ML-based relevance scoring Content extraction Automatic text extraction Autoprompt Query optimization for better results
Enable use_autoprompt (default) for natural language queries. Disable it when you need exact phrase matching or have already optimized your query.
Metric Value Latency 200-500ms Rate limits Based on Exa plan Parallel execution Concurrent with pipeline Caching Results cached for 1 hour
Common Pipeline Patterns
Internal + Web Hybrid Search
[
{
"stage_name" : "semantic_search" ,
"stage_type" : "filter" ,
"config" : {
"stage_id" : "feature_search" ,
"parameters" : {
"searches" : [
{ "feature_uri" : "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1" , "query" : { "input_mode" : "text" , "value" : "{{INPUT.query}}" }, "top_k" : 20 }
],
"final_top_k" : 20
}
}
},
{
"stage_name" : "external_web_search" ,
"stage_type" : "apply" ,
"config" : {
"stage_id" : "external_web_search" ,
"parameters" : {
"query" : "{{INPUT.query}}" ,
"num_results" : 10
}
}
},
{
"stage_name" : "rerank" ,
"stage_type" : "sort" ,
"config" : {
"stage_id" : "rerank" ,
"parameters" : {
"inference_name" : "BAAI__bge_reranker_v2_m3" ,
"top_k" : 10
}
}
}
]
Web-Augmented RAG
[
{
"stage_name" : "hybrid_search" ,
"stage_type" : "filter" ,
"config" : {
"stage_id" : "feature_search" ,
"parameters" : {
"searches" : [
{ "feature_uri" : "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1" , "query" : { "input_mode" : "text" , "value" : "{{INPUT.query}}" }, "top_k" : 30 }
],
"final_top_k" : 30
}
}
},
{
"stage_name" : "external_web_search" ,
"stage_type" : "apply" ,
"config" : {
"stage_id" : "external_web_search" ,
"parameters" : {
"query" : "{{INPUT.query}}" ,
"num_results" : 5 ,
"category" : "news" ,
"start_published_date" : "{{INPUT.date_filter}}"
}
}
},
{
"stage_name" : "rag_prepare" ,
"stage_type" : "apply" ,
"config" : {
"stage_id" : "rag_prepare" ,
"parameters" : {
"max_tokens" : 8000 ,
"output_mode" : "single_context"
}
}
}
]
Error Handling
Error Behavior API rate limit Retry with backoff Network timeout Stage fails gracefully, no web results Invalid domain Ignored, other domains searched No results found Empty web result set
Web search results may include content from untrusted sources. Consider filtering or validating web content before using in sensitive applications.