Documentation Index Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
The Taxonomy Enrich stage classifies documents against predefined taxonomies, adding structured category labels and hierarchical classifications to your search results.
Stage Category : ENRICH (Enriches documents with classifications)Transformation : N documents → N documents (with taxonomy labels added)
When to Use
Use Case Description Content categorization Auto-classify documents into topics Faceted search Add filterable category facets Compliance tagging Apply regulatory classifications Product taxonomy Classify into product hierarchies
When NOT to Use
Scenario Recommended Alternative Free-form tagging llm_enrichmentPre-classified content Skip this stage Custom classification logic api_call to custom service
Parameters
Parameter Type Default Description taxonomy_idstring Required ID of the taxonomy to use content_fieldstring contentField to classify result_fieldstring taxonomyField for classification results max_depthinteger nullMaximum hierarchy depth top_kinteger 3Number of top classifications min_confidencefloat 0.5Minimum confidence threshold include_ancestorsboolean trueInclude parent categories
Configuration Examples
Basic Classification
High-Confidence Only
Hierarchical with Ancestors
Multiple Classifications
{
"stage_type" : "enrich" ,
"stage_id" : "taxonomy_enrich" ,
"parameters" : {
"taxonomy_id" : "product_categories" ,
"content_field" : "content" ,
"result_field" : "categories"
}
}
Output Schema
Basic Classification
{
"document_id" : "doc_123" ,
"content" : "Latest smartphone with 5G connectivity..." ,
"categories" : {
"primary" : {
"id" : "electronics.mobile.smartphones" ,
"name" : "Smartphones" ,
"confidence" : 0.95
},
"all" : [
{ "id" : "electronics.mobile.smartphones" , "name" : "Smartphones" , "confidence" : 0.95 },
{ "id" : "electronics.mobile" , "name" : "Mobile Devices" , "confidence" : 0.82 },
{ "id" : "electronics" , "name" : "Electronics" , "confidence" : 0.78 }
]
}
}
With Ancestors
{
"document_id" : "doc_456" ,
"content" : "Investment banking services..." ,
"industry" : {
"primary" : {
"id" : "finance.banking.investment" ,
"name" : "Investment Banking" ,
"confidence" : 0.91
},
"ancestors" : [
{ "id" : "finance.banking" , "name" : "Banking" , "level" : 2 },
{ "id" : "finance" , "name" : "Finance" , "level" : 1 }
],
"path" : "Finance > Banking > Investment Banking"
}
}
Low Confidence (No Match)
{
"document_id" : "doc_789" ,
"content" : "Random unrelated content..." ,
"categories" : {
"primary" : null ,
"all" : [],
"message" : "No classifications above confidence threshold"
}
}
Taxonomy Structure
Taxonomies are hierarchical classification systems:
Electronics
├── Mobile Devices
│ ├── Smartphones
│ ├── Tablets
│ └── Wearables
├── Computers
│ ├── Laptops
│ ├── Desktops
│ └── Components
└── Audio
├── Headphones
└── Speakers
Each node has:
ID : Dot-notation path (e.g., electronics.mobile.smartphones)
Name : Human-readable label
Level : Depth in hierarchy (1 = root)
Metric Value Latency 10-50ms per document Batch processing Automatic Model type Embedding-based classification Parallel execution Up to 20 concurrent
Pre-compute taxonomy embeddings for faster classification. Use top_k: 1 and higher min_confidence when you only need the best match.
Common Pipeline Patterns
Search + Classify + Filter
[
{
"stage_type" : "filter" ,
"stage_id" : "semantic_search" ,
"parameters" : {
"query" : "{{INPUT.query}}" ,
"vector_index" : "text_extractor_v1_embedding" ,
"top_k" : 100
}
},
{
"stage_type" : "enrich" ,
"stage_id" : "taxonomy_enrich" ,
"parameters" : {
"taxonomy_id" : "product_categories" ,
"result_field" : "category" ,
"top_k" : 1 ,
"min_confidence" : 0.7
}
},
{
"stage_type" : "filter" ,
"stage_id" : "structured_filter" ,
"parameters" : {
"conditions" : {
"field" : "category.primary.id" ,
"operator" : "starts_with" ,
"value" : "{{INPUT.category_filter}}"
}
}
}
]
Multi-Taxonomy Classification
[
{
"stage_type" : "filter" ,
"stage_id" : "semantic_search" ,
"parameters" : {
"query" : "{{INPUT.query}}" ,
"vector_index" : "text_extractor_v1_embedding" ,
"top_k" : 50
}
},
{
"stage_type" : "enrich" ,
"stage_id" : "taxonomy_enrich" ,
"parameters" : {
"taxonomy_id" : "topics" ,
"result_field" : "topic"
}
},
{
"stage_type" : "enrich" ,
"stage_id" : "taxonomy_enrich" ,
"parameters" : {
"taxonomy_id" : "sentiment" ,
"result_field" : "sentiment"
}
}
]
Faceted Search Results
[
{
"stage_type" : "filter" ,
"stage_id" : "hybrid_search" ,
"parameters" : {
"query" : "{{INPUT.query}}" ,
"vector_index" : "text_extractor_v1_embedding" ,
"top_k" : 100
}
},
{
"stage_type" : "enrich" ,
"stage_id" : "taxonomy_enrich" ,
"parameters" : {
"taxonomy_id" : "categories" ,
"result_field" : "facets" ,
"top_k" : 3 ,
"include_ancestors" : true
}
}
]
Error Handling
Error Behavior Unknown taxonomy_id Stage fails No match found Empty classification, continues Invalid content_field Stage fails Low confidence Filtered by min_confidence