Anomaly Detection
Identify outliers and anomalous content using embedding distance from cluster centroids. Flag quality issues, novel content, or items that don't match expected patterns.
"Find images that don't match the expected product catalog style with anomaly score above 0.85"
Why This Matters
Anomalies can be problems (data quality issues) or opportunities (novel content). Either way, you need to find them before they find you.
import requestsAPI_URL = "https://api.mixpeek.com"headers = {"Authorization": "Bearer YOUR_API_KEY", "X-Namespace": "your-namespace"}# Create baseline clusters for anomaly detectioncluster = requests.post(f"{API_URL}/v1/clusters", headers=headers, json={"cluster_name": "baseline_distribution","source_collection_ids": ["col_my_collection"],"feature_addresses": ["mixpeek://multimodal_extractor@v1/embedding"],"algorithm": "hdbscan","algorithm_config": {"min_cluster_size": 20}}).json()# Execute to establish baselineexecution = requests.post(f"{API_URL}/v1/clusters/{cluster['cluster_id']}/execute",headers=headers).json()# Get artifacts including outliersartifacts = requests.get(f"{API_URL}/v1/clusters/{cluster['cluster_id']}/executions/{execution['run_id']}/artifacts",headers=headers,params={"include_members": True}).json()# Find anomalies (items marked as noise by HDBSCAN)outliers = [m for m in artifacts.get("members", []) if m["cluster_id"] == -1]print(f"Found {len(outliers)} anomalous items")# Analyze anomaly distributionfor item in outliers[:10]:print(f"Document: {item['document_id']}")print(f"Distance: {item.get('distance', 'N/A')}")
Feature Extractors
Image Embedding
Generate visual embeddings for similarity search and clustering
Video Embedding
Generate vector embeddings for video content
Text Embedding
Extract semantic embeddings from documents, transcripts and text content
Retriever Stages
feature search
Search collections using multimodal embeddings
score filter
Filter documents by relevance score threshold
