Documentation Index
Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Production-ready pipeline configurations for common multimodal retrieval patterns. Each recipe is a complete retriever config you can copy, customize, and deploy.
Brand Safety Scanner
Find talent appearing near competitor products in negative-sentiment content. The flagship multi-stage pipeline.
retriever = mp.retrievers.create(
name="brand-safety-scanner",
namespace="media-library",
stages=[
{"type": "filter", "method": "feature_search",
"feature_name": "face.identity",
"query": celebrity_embedding, "threshold": 0.72},
{"type": "filter", "method": "feature_search",
"feature_name": "logo.visual",
"query": competitor_logo_embedding},
{"type": "sort", "method": "score_linear",
"weights": {"audio.sentiment": 0.6, "recency": 0.3, "engagement": 0.1}},
{"type": "reduce", "method": "sampling", "limit": 10},
{"type": "enrich", "method": "document_enrich",
"collection": "brand-safety-scores"},
]
)
Stages: face search → logo filter → sentiment sort → top-10 → brand context enrichment
Result: 10 highest-risk scenes with brand safety scores attached
Multimodal RAG
Retrieve relevant context for an LLM across video, audio, and documents.
retriever = mp.retrievers.create(
name="multimodal-rag",
namespace="knowledge-base",
stages=[
{"type": "filter", "method": "feature_search",
"feature_name": "multimodal.semantic",
"query": "{{INPUT.query}}"},
{"type": "filter", "method": "metadata",
"where": {"department": "{{INPUT.department}}"}},
{"type": "sort", "method": "cross_encoder_rerank",
"model": "bge-reranker-v2-m3"},
{"type": "reduce", "method": "sampling", "limit": 10},
]
)
Stages: semantic search → department filter → cross-encoder rerank → top-10
Result: 10 most relevant chunks from video transcripts, slides, and docs
IP Clearance Pipeline
Check new content for copyrighted material before publication.
retriever = mp.retrievers.create(
name="ip-clearance",
namespace="media-library",
stages=[
{"type": "filter", "method": "feature_search",
"feature_name": "audio.fingerprint",
"query": new_audio_fingerprint, "threshold": 0.8},
{"type": "filter", "method": "feature_search",
"feature_name": "visual.similarity",
"query": new_content_frames},
{"type": "sort", "method": "score_linear",
"weights": {"match_confidence": 0.8, "rights_severity": 0.2}},
{"type": "enrich", "method": "document_enrich",
"collection": "rights-database"},
]
)
Stages: audio fingerprint match → visual similarity → confidence sort → rights enrichment
Result: Potential IP violations with licensing context attached
Reverse Image Search
Find visually similar content across a media library with deduplication.
retriever = mp.retrievers.create(
name="reverse-image",
namespace="catalog",
stages=[
{"type": "filter", "method": "feature_search",
"feature_name": "visual.embedding",
"query": query_image_embedding, "threshold": 0.7},
{"type": "filter", "method": "metadata",
"where": {"source": {"$ne": original_source}}},
{"type": "reduce", "method": "dedup",
"field": "source_url"},
{"type": "reduce", "method": "sampling", "limit": 20},
]
)
Stages: visual similarity → exclude self-matches → deduplicate by source → top-20
Result: 20 unique visually similar items from different sources
Content Moderation
Flag unsafe content across multiple safety dimensions.
retriever = mp.retrievers.create(
name="content-moderation",
namespace="user-uploads",
stages=[
{"type": "filter", "method": "feature_search",
"feature_name": "safety.nsfw",
"threshold": 0.6},
{"type": "filter", "method": "feature_search",
"feature_name": "safety.violence",
"threshold": 0.5},
{"type": "sort", "method": "score_linear",
"weights": {"nsfw_score": 0.5, "violence_score": 0.5}},
{"type": "apply", "method": "webhook",
"url": "https://moderation.internal/review-queue"},
]
)
Stages: NSFW filter → violence filter → combined risk sort → send to review queue
Result: Flagged content sent to human review, sorted by severity
Document Q&A
Answer questions across a document corpus with citation tracking.
retriever = mp.retrievers.create(
name="document-qa",
namespace="legal-docs",
stages=[
{"type": "filter", "method": "feature_search",
"feature_name": "text.semantic",
"query": "{{INPUT.question}}"},
{"type": "filter", "method": "metadata",
"where": {"doc_type": {"$in": ["contract", "policy"]}}},
{"type": "sort", "method": "cross_encoder_rerank",
"model": "bge-reranker-v2-m3"},
{"type": "reduce", "method": "sampling", "limit": 5},
{"type": "enrich", "method": "document_enrich",
"collection": "document-metadata"},
]
)
Stages: semantic search → filter by doc type → rerank → top-5 → attach doc metadata
Result: 5 most relevant passages with source document, page number, and classification
Duplicate Detection
Find near-duplicates across a massive media library.
retriever = mp.retrievers.create(
name="dedup-scanner",
namespace="media-archive",
stages=[
{"type": "filter", "method": "feature_search",
"feature_name": "visual.perceptual_hash",
"query": target_hash, "threshold": 0.85},
{"type": "filter", "method": "metadata",
"where": {"ingested_after": "2026-01-01"}},
{"type": "reduce", "method": "dedup",
"field": "source_url"},
]
)
Stages: perceptual hash similarity → date filter → deduplicate
Result: Unique near-duplicate items ingested in the target time range
Contextual Ad Targeting
IAB category classification for contextual advertising without cookies.
retriever = mp.retrievers.create(
name="contextual-targeting",
namespace="publisher-content",
stages=[
{"type": "filter", "method": "feature_search",
"feature_name": "multimodal.semantic",
"query": ad_campaign_description},
{"type": "filter", "method": "metadata",
"where": {"iab_category": {"$in": target_categories}}},
{"type": "sort", "method": "score_linear",
"weights": {"relevance": 0.7, "recency": 0.2, "engagement": 0.1}},
{"type": "reduce", "method": "sampling", "limit": 50},
]
)
Stages: semantic relevance → IAB category filter → weighted scoring → top-50
Result: 50 most relevant content placements for the ad campaign
Track brand mentions across video and audio content in real-time.
retriever = mp.retrievers.create(
name="brand-monitor",
namespace="social-feeds",
stages=[
{"type": "filter", "method": "feature_search",
"feature_name": "logo.visual",
"query": brand_logo_embedding},
{"type": "filter", "method": "feature_search",
"feature_name": "audio.speech",
"query": "brand name mention"},
{"type": "sort", "method": "score_linear",
"weights": {"engagement": 0.5, "sentiment": 0.3, "reach": 0.2}},
{"type": "reduce", "method": "sampling", "limit": 25},
{"type": "apply", "method": "webhook",
"url": "https://social.internal/brand-alerts"},
]
)
Stages: logo detection → speech mention → engagement sort → top-25 → alert
Result: 25 highest-impact brand mentions with alerts to the social team
Counterfeit Detection with Supabase Writeback
Ingest suspect marketplace listings from a Supabase database, match against a brand catalog, classify with an LLM, and write the verdict back to the source row.
retriever = mp.retrievers.create(
name="counterfeit-detector",
namespace="brand-protection",
stages=[
{"type": "filter", "method": "feature_search",
"feature_name": "image_extractor_v1_embedding",
"query_input": "image",
"collection": "brand-catalog-embeddings",
"top_k": 10},
{"type": "sort", "method": "rerank",
"limit": 5},
{"type": "enrich", "method": "llm_enrich",
"model": "gpt-4o-mini",
"multimodal_inputs": {"suspect_image": "image"},
"output_schema": {
"classification": "COUNTERFEIT|DUPE|LEGIT_RESALE|UNRELATED",
"confidence": "float",
"reasoning": "string"
}},
]
)
Stages: visual similarity search → precision rerank → multimodal LLM verdict
Result: Each suspect listing classified with confidence score and reasoning
Pair with a Supabase source adapter and writeback to automatically:
- Ingest new listings when rows are inserted into Supabase
- Write
mp_verdict, mp_confidence, mp_reasoning back to each row
- Query enriched results directly from Supabase — no Mixpeek API polling needed
To scan a batch of suspect images at once, use batch execution:
results = mp.retrievers.execute_batch(
retriever_id="counterfeit-detector",
queries=[{"inputs": {"image": url}} for url in suspect_image_urls],
concurrency=10,
stream=True,
)
Pattern: Combining Recipes
These recipes compose. A common pattern is to build a base pipeline and extend it:
# Start with brand safety scanner
base_stages = [face_filter, logo_filter, sentiment_sort, reduce_10]
# Add IP clearance
full_stages = base_stages + [audio_fingerprint_filter, rights_enrich]
# Add automated alerting
monitored_stages = full_stages + [slack_webhook_apply]
Each stage is independent. Add, remove, or reorder them to match your use case.