Semantic Drift Detection
Monitor distribution shifts between baseline and current data using cluster comparison. Detect when new content diverges from training data or when content mix changes unexpectedly.
"Detect distribution drift in training data between Q1 baseline and current dataset"
Why This Matters
Data drift is silent model degradation. By comparing cluster distributions over time, you catch drift before it impacts production systems.
import requestsAPI_URL = "https://api.mixpeek.com"headers = {"Authorization": "Bearer YOUR_API_KEY", "X-Namespace": "your-namespace"}# Create cluster config for drift monitoringcluster = requests.post(f"{API_URL}/v1/clusters", headers=headers, json={"cluster_name": "training_baseline","source_collection_ids": ["col_training_data"],"feature_addresses": ["mixpeek://multimodal_extractor@v1/embedding"],"algorithm": "hdbscan","algorithm_config": {"min_cluster_size": 20}}).json()# Create baseline snapshotbaseline = requests.post(f"{API_URL}/v1/clusters/{cluster['cluster_id']}/execute",headers=headers).json()print(f"Baseline run_id: {baseline['run_id']}")# Later: Execute again to comparecurrent = requests.post(f"{API_URL}/v1/clusters/{cluster['cluster_id']}/execute",headers=headers).json()# Compare executions by fetching both artifactsbaseline_artifacts = requests.get(f"{API_URL}/v1/clusters/{cluster['cluster_id']}/executions/{baseline['run_id']}/artifacts",headers=headers).json()current_artifacts = requests.get(f"{API_URL}/v1/clusters/{cluster['cluster_id']}/executions/{current['run_id']}/artifacts",headers=headers).json()# Compare cluster counts and distributionsbaseline_count = len(baseline_artifacts.get("clusters", []))current_count = len(current_artifacts.get("clusters", []))print(f"Baseline: {baseline_count} clusters, Current: {current_count} clusters")if abs(current_count - baseline_count) > 2:print("ALERT: Significant drift detected!")
Feature Extractors
Image Embedding
Generate visual embeddings for similarity search and clustering
Text Embedding
Extract semantic embeddings from documents, transcripts and text content
Video Embedding
Generate vector embeddings for video content
