Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Content classification: documents are enriched via taxonomies (flat or hierarchical similarity matching) or retriever enrichments (pipeline execution with field write-back)
For full configuration details, parameters, and advanced options, see the Taxonomies reference.

Taxonomies

Auto-classify documents by matching them against reference collections. Two types: Flat — match each document against a single reference collection. When similarity exceeds the threshold, enrichment fields (SKU, category, label) are attached. Hierarchical — parent/child nodes with inheritance. Documents traverse levels of refinement (brand → category → subcategory) using different features at each level.
curl -X POST "https://api.mixpeek.com/v1/taxonomies" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "taxonomy_name": "product-categories",
    "type": "flat",
    "reference_collection_id": "'$REF_COLLECTION_ID'",
    "feature_uri": "mixpeek://multimodal_extractor@v1/multimodal_embedding",
    "similarity_threshold": 0.75,
    "enrichment_fields": ["category", "subcategory", "brand"]
  }'

When to Run

ModeRunsUse case
on_demandAt query time as a retriever stageDynamic classification, A/B testing
materializeAfter extraction, persists to collectionStable labels, fast queries
retroactiveReapplies when taxonomy updatesBackfill when reference data improves
Taxonomy API →

Retriever Enrichments

Attach a retriever pipeline to a collection so it runs on every new document. The retriever executes, and selected result fields are written back to the document.
curl -X PATCH "https://api.mixpeek.com/v1/collections/$COLLECTION_ID" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "retriever_enrichments": [{
      "retriever_id": "'$RETRIEVER_ID'",
      "input_mappings": { "query_text": { "source": "payload", "path": "description" } },
      "write_back_fields": { "category": { "mode": "first", "path": "results[0].metadata.category" } }
    }]
  }'
Use cases: auto-classify via LLM, cross-collection joins, label propagation from seed documents. Collection update API →

Annotations

Explicit human decisions with full provenance — the ground truth layer for compliance, review workflows, and improving retrieval quality over time.
curl -X POST "https://api.mixpeek.com/v1/annotations" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "document_id": "doc_abc",
    "collection_id": "col_xyz",
    "retriever_id": "ret_123",
    "execution_id": "exec_789",
    "stage_name": "feature_search",
    "label": "approved",
    "confidence": 0.95,
    "reasoning": "Matches reference product exactly",
    "payload": { "sku": "SKU-001", "action": "keep" },
    "actor_id": "user_456",
    "actor_type": "human"
  }'

What Each Annotation Captures

FieldPurpose
document_id, collection_idWhat was reviewed
retriever_id, execution_id, stage_nameHow it was surfaced
label, confidence, reasoningThe decision
payloadStructured workflow-specific data (SKU, action, notes)
actor_id, actor_typeWho decided (human or model)
Annotations are stored independently from documents — they never modify the source data. Use them to build review queues, audit trails, and curated ground truth datasets.

Bulk Operations

Process review queues at scale with the bulk API:
curl -X POST "https://api.mixpeek.com/v1/annotations/bulk" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "annotations": [
      { "document_id": "doc_1", "collection_id": "col_xyz", "label": "approved" },
      { "document_id": "doc_2", "collection_id": "col_xyz", "label": "rejected", "reasoning": "Low quality match" }
    ]
  }'

The Feedback Loop

Annotations feed directly into the platform’s learning cycle:
  1. Annotations provide explicit ground truth for edge cases
  2. Learned fusion uses annotations to auto-tune retriever stage weights
  3. Approved annotations can be piped into reference collections, expanding your taxonomy’s coverage
  4. Retroactive taxonomy application reclassifies existing documents when annotations improve the reference set
Annotation API → · Bulk API →

Choosing an Approach

GoalUse
Auto-label with a reference catalogFlat taxonomy (materialize mode)
Hierarchical classification (brand → category → SKU)Hierarchical taxonomy
Auto-classify via LLM at ingestRetriever enrichment with llm_enrich stage
Cross-collection joins (enrich from another dataset)Retriever enrichment with document_enrich stage
Human review with audit trailAnnotations
Backfill when labels improveRetroactive taxonomy application