Taxonomies
Auto-classify documents by matching them against reference collections. Two types: Flat — match each document against a single reference collection. When similarity exceeds the threshold, enrichment fields (SKU, category, label) are attached. Hierarchical — parent/child nodes with inheritance. Documents traverse levels of refinement (brand → category → subcategory) using different features at each level.When to Run
| Mode | Runs | Use case |
|---|---|---|
on_demand | At query time as a retriever stage | Dynamic classification, A/B testing |
materialize | After extraction, persists to collection | Stable labels, fast queries |
retroactive | Reapplies when taxonomy updates | Backfill when reference data improves |
Retriever Enrichments
Attach a retriever pipeline to a collection so it runs on every new document. The retriever executes, and selected result fields are written back to the document.Annotations
Explicit human decisions with full provenance — the ground truth layer for compliance and review workflows.Choosing an Approach
| Goal | Use |
|---|---|
| Auto-label with a reference catalog | Flat taxonomy (materialize mode) |
| Hierarchical classification (brand → category → SKU) | Hierarchical taxonomy |
| Auto-classify via LLM at ingest | Retriever enrichment with llm_enrich stage |
| Cross-collection joins (enrich from another dataset) | Retriever enrichment with document_enrich stage |
| Human review with audit trail | Annotations |
| Backfill when labels improve | Retroactive taxonomy application |

