Annotations - Mixpeek

Annotations capture explicit human judgments on documents — approve, reject, defer, or any domain-specific label. Unlike interaction signals which track implicit behavior (clicks, views, dwell time), annotations record deliberate decisions with optional confidence scores, reasoning, and structured payloads. They are the foundation for human-in-the-loop workflows where retriever results need expert review before action.

When to Use Annotations

Annotations solve the problem of turning retriever output into verified decisions. Any workflow where a person reviews documents and records a judgment benefits from annotations:

Use Case	Labels	Payload Example
Medical coding review	`approved`, `rejected`, `deferred`	`{"codes_approved": ["E11.40"], "raf_impact": 0.302}`
Brand infringement triage	`infringement`, `safe`, `needs_review`	`{"confidence_model": 0.91, "match_type": "logo"}`
Duplicate detection	`confirmed_dupe`, `false_positive`	`{"canonical_id": "doc_abc", "similarity": 0.97}`
Content moderation	`approved`, `flagged`, `removed`	`{"policy_violation": "copyright", "severity": "high"}`
Document classification	`correct`, `incorrect`, `ambiguous`	`{"predicted_class": "invoice", "true_class": "receipt"}`

Annotations are domain-agnostic — labels are free-form strings, and the payload field accepts any structured JSON your workflow needs.

Annotation Lifecycle

Retriever executes → Results returned → Human reviews → Annotation recorded → Audit trail preserved

Each annotation links back to a document and optionally to the retriever execution that surfaced it:

document_id and collection_id — what was reviewed
retriever_id, execution_id, stage_name — how the document was found (provenance)
label, confidence, reasoning — the human decision
payload — structured data specific to the workflow
actor_id and actor_type — who made the decision (user, API key, or system)

All mutations emit webhooks (annotation.created, annotation.updated, annotation.deleted) and log to the audit trail.

Create an Annotation

Record a decision after reviewing a document:

curl -sS -X POST "$MP_API_URL/v1/annotations" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "document_id": "doc_2e7650fa254b",
    "collection_id": "col_clinical_notes",
    "label": "approved",
    "confidence": 0.95,
    "reasoning": "Note clearly documents peripheral neuropathy with supporting lab values.",
    "payload": {
      "codes_approved": ["E11.40", "E11.65"],
      "raf_impact": 0.420,
      "annual_revenue": 2522
    },
    "retriever_id": "ret_hcc_review",
    "execution_id": "exec_abc123"
  }'

from mixpeek import Mixpeek

mp = Mixpeek(api_key="API_KEY")

annotation = mp.annotations.create(
    document_id="doc_2e7650fa254b",
    collection_id="col_clinical_notes",
    label="approved",
    confidence=0.95,
    reasoning="Note clearly documents peripheral neuropathy with supporting lab values.",
    payload={
        "codes_approved": ["E11.40", "E11.65"],
        "raf_impact": 0.420,
        "annual_revenue": 2522,
    },
    retriever_id="ret_hcc_review",
    execution_id="exec_abc123",
    namespace="ns_vitae",
)

import Mixpeek from "mixpeek";

const mp = new Mixpeek({ apiKey: "API_KEY" });

const annotation = await mp.annotations.create({
  documentId: "doc_2e7650fa254b",
  collectionId: "col_clinical_notes",
  label: "approved",
  confidence: 0.95,
  reasoning:
    "Note clearly documents peripheral neuropathy with supporting lab values.",
  payload: {
    codes_approved: ["E11.40", "E11.65"],
    raf_impact: 0.42,
    annual_revenue: 2522,
  },
  retrieverId: "ret_hcc_review",
  executionId: "exec_abc123",
  namespace: "ns_vitae",
});

Query Annotations

List annotations with filters to build review queues or dashboards:

# All rejected annotations for a collection
curl -sS -X POST "$MP_API_URL/v1/annotations/list" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "collection_id": "col_clinical_notes",
    "label": "rejected"
  }'

# All annotations on a specific document
curl -sS -X POST "$MP_API_URL/v1/annotations/list" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "document_id": "doc_2e7650fa254b"
  }'

# All rejected annotations
rejected = mp.annotations.list(
    collection_id="col_clinical_notes",
    label="rejected",
    namespace="ns_vitae",
)

# All annotations on a specific document
doc_annotations = mp.annotations.list(
    document_id="doc_2e7650fa254b",
    namespace="ns_vitae",
)

// All rejected annotations
const rejected = await mp.annotations.list({
  collectionId: "col_clinical_notes",
  label: "rejected",
  namespace: "ns_vitae",
});

// All annotations on a specific document
const docAnnotations = await mp.annotations.list({
  documentId: "doc_2e7650fa254b",
  namespace: "ns_vitae",
});

Available filters: document_id, collection_id, label, actor_id, retriever_id. All filters are optional and can be combined.

Aggregate Stats

Get label distribution counts for dashboards and progress tracking:

# Stats across all annotations
curl -sS "$MP_API_URL/v1/annotations/stats" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE"

# Stats for a specific collection
curl -sS "$MP_API_URL/v1/annotations/stats?collection_id=col_clinical_notes" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE"

stats = mp.annotations.stats(namespace="ns_vitae")
# {"total": 142, "by_label": {"approved": 89, "rejected": 31, "deferred": 22}}

const stats = await mp.annotations.stats({ namespace: "ns_vitae" });
// {total: 142, byLabel: {approved: 89, rejected: 31, deferred: 22}}

Update a Decision

When a review is revisited — for example, a deferred case gets a clinical consult and can now be approved:

curl -sS -X PATCH "$MP_API_URL/v1/annotations/ann_3cefcdaf7536a19a" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "label": "approved",
    "confidence": 0.88,
    "reasoning": "Clinical review completed — peripheral neuropathy confirmed."
  }'

updated = mp.annotations.update(
    annotation_id="ann_3cefcdaf7536a19a",
    label="approved",
    confidence=0.88,
    reasoning="Clinical review completed — peripheral neuropathy confirmed.",
    namespace="ns_vitae",
)

const updated = await mp.annotations.update({
  annotationId: "ann_3cefcdaf7536a19a",
  label: "approved",
  confidence: 0.88,
  reasoning: "Clinical review completed — peripheral neuropathy confirmed.",
  namespace: "ns_vitae",
});

The audit trail records both the original and updated values, preserving the full decision history.

Bulk Operations

Process up to 1000 creates, updates, and deletes in a single call. Each operation is independent — a failure in one does not roll back the others.

curl -sS -X POST "$MP_API_URL/v1/annotations/bulk" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "create": [
      {"document_id": "doc_001", "collection_id": "col_notes", "label": "approved", "confidence": 0.95},
      {"document_id": "doc_002", "collection_id": "col_notes", "label": "approved", "confidence": 0.91},
      {"document_id": "doc_003", "collection_id": "col_notes", "label": "rejected", "reasoning": "Insufficient documentation"}
    ],
    "update": [
      {"annotation_id": "ann_abc123", "label": "approved", "confidence": 0.88}
    ],
    "delete": ["ann_def456", "ann_ghi789"]
  }'

result = mp.annotations.bulk(
    create=[
        {"document_id": "doc_001", "collection_id": "col_notes", "label": "approved", "confidence": 0.95},
        {"document_id": "doc_002", "collection_id": "col_notes", "label": "approved", "confidence": 0.91},
        {"document_id": "doc_003", "collection_id": "col_notes", "label": "rejected", "reasoning": "Insufficient documentation"},
    ],
    update=[
        {"annotation_id": "ann_abc123", "label": "approved", "confidence": 0.88},
    ],
    delete=["ann_def456", "ann_ghi789"],
    namespace="ns_vitae",
)
# result.created_count=3, result.updated_count=1, result.deleted_count=2

const result = await mp.annotations.bulk({
  create: [
    { documentId: "doc_001", collectionId: "col_notes", label: "approved", confidence: 0.95 },
    { documentId: "doc_002", collectionId: "col_notes", label: "approved", confidence: 0.91 },
    { documentId: "doc_003", collectionId: "col_notes", label: "rejected", reasoning: "Insufficient documentation" },
  ],
  update: [
    { annotationId: "ann_abc123", label: "approved", confidence: 0.88 },
  ],
  delete: ["ann_def456", "ann_ghi789"],
  namespace: "ns_vitae",
});
// result.createdCount=3, result.updatedCount=1, result.deletedCount=2

The response includes per-operation results so you can identify and retry individual failures.

Example: Medical Coding Review Workflow

A healthcare organization uses Mixpeek to surface HCC suspect conditions from clinical notes. Coders review each result and record their decision:

Retriever runs agent_search across clinical notes, returning suspect HCC conditions with supporting evidence.
Review queue — the application calls POST /v1/annotations/list?label=deferred to show unresolved cases.
Coder annotates — for each document, the coder selects a label and the app calls POST /v1/annotations with the decision, ICD-10 codes, and RAF impact.
Dashboard — GET /v1/annotations/stats?collection_id=col_notes powers a progress bar showing approved/rejected/deferred counts.
Audit — compliance officers query annotations by actor_id to review individual coder decisions. The reasoning field provides the justification trail required for CMS audits.

Annotations are stored independently from documents — they don’t modify the underlying document data. This separation ensures that the original clinical record remains untouched while the review layer captures all human decisions.

Best Practices

Use consistent labels within a workflow. Pick a label vocabulary (e.g., approved, rejected, deferred) and stick with it — the stats endpoint groups by exact string match.
Include reasoning for audit-sensitive workflows. The reasoning field is indexed and retrievable, making it valuable for compliance reviews and dispute resolution.
Link provenance when annotating retriever results. Setting retriever_id, execution_id, and stage_name lets you trace exactly how the document was surfaced, which is critical for evaluating retriever quality.
Use payload for structured data rather than encoding it in the label. Labels should be human-readable categories; domain-specific fields (codes, scores, amounts) belong in payload.
Listen to webhooks for real-time updates. Subscribe to annotation.created and annotation.updated events to trigger downstream workflows (e.g., auto-submit approved records, escalate rejected ones).

References

Create Annotation
List Annotations
Annotation Stats
Update Annotation
Delete Annotation
Bulk Annotations
Interaction Signals — implicit behavioral signals (complementary to annotations)
Webhooks — subscribe to annotation lifecycle events

​When to Use Annotations

​Annotation Lifecycle

​Create an Annotation

​Query Annotations

​Aggregate Stats

​Update a Decision

​Bulk Operations

​Example: Medical Coding Review Workflow

​Best Practices

​References

When to Use Annotations

Annotation Lifecycle

Create an Annotation

Query Annotations

Aggregate Stats

Update a Decision

Bulk Operations

Example: Medical Coding Review Workflow

Best Practices

References