Documentation Index Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
The Classify stage labels documents using a custom classifier model deployed as a custom extractor . It sends document text to your extractor’s inference endpoint and attaches predicted labels with confidence scores to each document.
Stage Category : APPLY (Enriches documents)Transformation : N documents → N documents (with classification labels added)
When to Use
Use Case Description Custom classification Apply your own trained classifier to search results Content labeling Tag documents with domain-specific categories Compliance scoring Score documents against compliance criteria Intent detection Classify user queries or document intent
When NOT to Use
Scenario Recommended Alternative Predefined taxonomy classification taxonomy_enrich (no custom model needed)LLM-based classification llm_enrich with output_schemaSimple keyword matching attribute_filter
Parameters
Parameter Type Default Description feature_uristring Required Feature URI of your custom classifier plugin document_fieldstring "content"Document field path containing text to classify output_fieldstring "classification"Field path to store classification results max_document_charsinteger 5000Maximum characters sent for classification (100–50000) top_k_labelsinteger nullKeep only the top-k labels by confidence min_confidencefloat nullMinimum confidence threshold (0.0–1.0) batch_sizeinteger 10Documents per inference call (1–100) max_concurrencyinteger 5Maximum concurrent inference requests (1–20)
Plugin Contract
Your classifier plugin must accept {text: str} and return {labels: [{label: str, confidence: float}]}.
# In your plugin's realtime.py
class ClassifierService ( BaseInferenceService ):
def _process_single ( self , inputs : dict , parameters : dict ) -> dict :
text = inputs[ "text" ]
# Your classification logic here
return {
"labels" : [
{ "label" : "technology" , "confidence" : 0.92 },
{ "label" : "business" , "confidence" : 0.78 },
{ "label" : "science" , "confidence" : 0.45 },
]
}
Set inference_type: "classify" in your plugin’s manifest to declare compatibility with the classify stage.
Configuration Examples
Basic Classification
With Confidence Filtering
High-Throughput
{
"stage_name" : "my_classifier" ,
"config" : {
"stage_id" : "classify" ,
"parameters" : {
"feature_uri" : "mixpeek://my_classifier@1.0.0/classify" ,
"document_field" : "content" ,
"output_field" : "classification"
}
}
}
Output
Each document gets the classification labels added at output_field:
{
"document_id" : "doc_123" ,
"content" : "Apple Inc. announced new AI features..." ,
"classification" : [
{ "label" : "technology" , "confidence" : 0.95 },
{ "label" : "business" , "confidence" : 0.82 }
]
}
Metric Value Latency Depends on your plugin model Batch size 10 documents default Concurrency 5 parallel requests default Max document chars 5000 default
Common Pipeline Patterns
Search + Classify + Filter by Label
[
{
"stage_name" : "search" ,
"config" : {
"stage_id" : "feature_search" ,
"parameters" : {
"queries" : [{
"vector_index" : "text_extractor.all_minilm_l6_v2_v1" ,
"query" : "{{INPUT.query}}"
}]
}
}
},
{
"stage_name" : "classify" ,
"config" : {
"stage_id" : "classify" ,
"parameters" : {
"feature_uri" : "mixpeek://my_classifier@1.0.0/classify" ,
"min_confidence" : 0.7 ,
"top_k_labels" : 1
}
}
},
{
"stage_name" : "filter_by_label" ,
"config" : {
"stage_id" : "attribute_filter" ,
"parameters" : {
"field" : "classification.0.label" ,
"operator" : "eq" ,
"value" : "{{INPUT.target_category}}"
}
}
}
]