Classify

The Classify stage labels documents using a custom classifier model deployed as a custom extractor. It sends document text to your extractor’s inference endpoint and attaches predicted labels with confidence scores to each document.

Stage Category: APPLY (Enriches documents)Transformation: N documents → N documents (with classification labels added)

When to Use

Use Case	Description
Custom classification	Apply your own trained classifier to search results
Content labeling	Tag documents with domain-specific categories
Compliance scoring	Score documents against compliance criteria
Intent detection	Classify user queries or document intent

When NOT to Use

Scenario	Recommended Alternative
Predefined taxonomy classification	`taxonomy_enrich` (no custom model needed)
LLM-based classification	`llm_enrich` with `output_schema`
Simple keyword matching	`attribute_filter`

Parameters

Parameter	Type	Default	Description
`feature_uri`	string	Required	Feature URI of your custom classifier plugin
`document_field`	string	`"content"`	Document field path containing text to classify
`output_field`	string	`"classification"`	Field path to store classification results
`max_document_chars`	integer	`5000`	Maximum characters sent for classification (100–50000)
`top_k_labels`	integer	`null`	Keep only the top-k labels by confidence
`min_confidence`	float	`null`	Minimum confidence threshold (0.0–1.0)
`batch_size`	integer	`10`	Documents per inference call (1–100)
`max_concurrency`	integer	`5`	Maximum concurrent inference requests (1–20)

Plugin Contract

Your classifier plugin must accept {text: str} and return {labels: [{label: str, confidence: float}]}.

# In your plugin's realtime.py
class ClassifierService(BaseInferenceService):
    def _process_single(self, inputs: dict, parameters: dict) -> dict:
        text = inputs["text"]
        # Your classification logic here
        return {
            "labels": [
                {"label": "technology", "confidence": 0.92},
                {"label": "business", "confidence": 0.78},
                {"label": "science", "confidence": 0.45},
            ]
        }

Set inference_type: "classify" in your plugin’s manifest to declare compatibility with the classify stage.

Configuration Examples

{
  "stage_name": "my_classifier",
  "config": {
    "stage_id": "classify",
    "parameters": {
      "feature_uri": "mixpeek://my_classifier@1.0.0/classify",
      "document_field": "content",
      "output_field": "classification"
    }
  }
}

Output

Each document gets the classification labels added at output_field:

{
  "document_id": "doc_123",
  "content": "Apple Inc. announced new AI features...",
  "classification": [
    {"label": "technology", "confidence": 0.95},
    {"label": "business", "confidence": 0.82}
  ]
}

Performance

Metric	Value
Latency	Depends on your plugin model
Batch size	10 documents default
Concurrency	5 parallel requests default
Max document chars	5000 default

Common Pipeline Patterns

Search + Classify + Filter by Label

[
  {
    "stage_name": "search",
    "config": {
      "stage_id": "feature_search",
      "parameters": {
        "queries": [{
          "vector_index": "text_extractor.all_minilm_l6_v2_v1",
          "query": "{{INPUT.query}}"
        }]
      }
    }
  },
  {
    "stage_name": "classify",
    "config": {
      "stage_id": "classify",
      "parameters": {
        "feature_uri": "mixpeek://my_classifier@1.0.0/classify",
        "min_confidence": 0.7,
        "top_k_labels": 1
      }
    }
  },
  {
    "stage_name": "filter_by_label",
    "config": {
      "stage_id": "attribute_filter",
      "parameters": {
        "field": "classification.0.label",
        "operator": "eq",
        "value": "{{INPUT.target_category}}"
      }
    }
  }
]

Taxonomy Enrich - Predefined taxonomy classification (no custom model)
LLM Enrich - LLM-based enrichment and classification
Custom Extractors - Build and deploy custom inference models

Get Started

What Mixpeek Extracts

Retrieval

Platform

Vector Store

Resources

When to Use

When NOT to Use

Parameters

Plugin Contract

Configuration Examples

Output

Performance

Common Pipeline Patterns

Search + Classify + Filter by Label

Get Started

What Mixpeek Extracts

Retrieval

Platform

Vector Store

Resources

Documentation Index

​When to Use

​When NOT to Use

​Parameters

​Plugin Contract

​Configuration Examples

​Output

​Performance

​Common Pipeline Patterns

​Search + Classify + Filter by Label

​Related

When to Use

When NOT to Use

Parameters

Plugin Contract

Configuration Examples

Output

Performance

Common Pipeline Patterns

Search + Classify + Filter by Label

Related