Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

The Classify stage labels documents using a custom classifier model deployed as a custom extractor. It sends document text to your extractor’s inference endpoint and attaches predicted labels with confidence scores to each document.
Stage Category: APPLY (Enriches documents)Transformation: N documents → N documents (with classification labels added)

When to Use

Use CaseDescription
Custom classificationApply your own trained classifier to search results
Content labelingTag documents with domain-specific categories
Compliance scoringScore documents against compliance criteria
Intent detectionClassify user queries or document intent

When NOT to Use

ScenarioRecommended Alternative
Predefined taxonomy classificationtaxonomy_enrich (no custom model needed)
LLM-based classificationllm_enrich with output_schema
Simple keyword matchingattribute_filter

Parameters

ParameterTypeDefaultDescription
feature_uristringRequiredFeature URI of your custom classifier plugin
document_fieldstring"content"Document field path containing text to classify
output_fieldstring"classification"Field path to store classification results
max_document_charsinteger5000Maximum characters sent for classification (100–50000)
top_k_labelsintegernullKeep only the top-k labels by confidence
min_confidencefloatnullMinimum confidence threshold (0.0–1.0)
batch_sizeinteger10Documents per inference call (1–100)
max_concurrencyinteger5Maximum concurrent inference requests (1–20)

Plugin Contract

Your classifier plugin must accept {text: str} and return {labels: [{label: str, confidence: float}]}.
# In your plugin's realtime.py
class ClassifierService(BaseInferenceService):
    def _process_single(self, inputs: dict, parameters: dict) -> dict:
        text = inputs["text"]
        # Your classification logic here
        return {
            "labels": [
                {"label": "technology", "confidence": 0.92},
                {"label": "business", "confidence": 0.78},
                {"label": "science", "confidence": 0.45},
            ]
        }
Set inference_type: "classify" in your plugin’s manifest to declare compatibility with the classify stage.

Configuration Examples

{
  "stage_name": "my_classifier",
  "config": {
    "stage_id": "classify",
    "parameters": {
      "feature_uri": "mixpeek://my_classifier@1.0.0/classify",
      "document_field": "content",
      "output_field": "classification"
    }
  }
}

Output

Each document gets the classification labels added at output_field:
{
  "document_id": "doc_123",
  "content": "Apple Inc. announced new AI features...",
  "classification": [
    {"label": "technology", "confidence": 0.95},
    {"label": "business", "confidence": 0.82}
  ]
}

Performance

MetricValue
LatencyDepends on your plugin model
Batch size10 documents default
Concurrency5 parallel requests default
Max document chars5000 default

Common Pipeline Patterns

Search + Classify + Filter by Label

[
  {
    "stage_name": "search",
    "config": {
      "stage_id": "feature_search",
      "parameters": {
        "queries": [{
          "vector_index": "text_extractor.all_minilm_l6_v2_v1",
          "query": "{{INPUT.query}}"
        }]
      }
    }
  },
  {
    "stage_name": "classify",
    "config": {
      "stage_id": "classify",
      "parameters": {
        "feature_uri": "mixpeek://my_classifier@1.0.0/classify",
        "min_confidence": 0.7,
        "top_k_labels": 1
      }
    }
  },
  {
    "stage_name": "filter_by_label",
    "config": {
      "stage_id": "attribute_filter",
      "parameters": {
        "field": "classification.0.label",
        "operator": "eq",
        "value": "{{INPUT.target_category}}"
      }
    }
  }
]