Retrieval Cookbook

Production-ready pipeline configurations for common multimodal retrieval patterns. Each recipe is a complete retriever config you can copy, customize, and deploy.

Brand Safety Scanner

Find talent appearing near competitor products in negative-sentiment content. The flagship multi-stage pipeline.

retriever = mp.retrievers.create(
    name="brand-safety-scanner",
    namespace="media-library",
    stages=[
        {"type": "filter", "method": "feature_search",
         "feature_name": "face.identity",
         "query": celebrity_embedding, "threshold": 0.72},
        {"type": "filter", "method": "feature_search",
         "feature_name": "logo.visual",
         "query": competitor_logo_embedding},
        {"type": "sort", "method": "score_linear",
         "weights": {"audio.sentiment": 0.6, "recency": 0.3, "engagement": 0.1}},
        {"type": "reduce", "method": "sampling", "limit": 10},
        {"type": "enrich", "method": "document_enrich",
         "collection": "brand-safety-scores"},
    ]
)

Stages: face search → logo filter → sentiment sort → top-10 → brand context enrichment Result: 10 highest-risk scenes with brand safety scores attached

Multimodal RAG

Retrieve relevant context for an LLM across video, audio, and documents.

retriever = mp.retrievers.create(
    name="multimodal-rag",
    namespace="knowledge-base",
    stages=[
        {"type": "filter", "method": "feature_search",
         "feature_name": "multimodal.semantic",
         "query": "{{INPUT.query}}"},
        {"type": "filter", "method": "metadata",
         "where": {"department": "{{INPUT.department}}"}},
        {"type": "sort", "method": "cross_encoder_rerank",
         "model": "bge-reranker-v2-m3"},
        {"type": "reduce", "method": "sampling", "limit": 10},
    ]
)

Stages: semantic search → department filter → cross-encoder rerank → top-10 Result: 10 most relevant chunks from video transcripts, slides, and docs

IP Clearance Pipeline

Check new content for copyrighted material before publication.

retriever = mp.retrievers.create(
    name="ip-clearance",
    namespace="media-library",
    stages=[
        {"type": "filter", "method": "feature_search",
         "feature_name": "audio.fingerprint",
         "query": new_audio_fingerprint, "threshold": 0.8},
        {"type": "filter", "method": "feature_search",
         "feature_name": "visual.similarity",
         "query": new_content_frames},
        {"type": "sort", "method": "score_linear",
         "weights": {"match_confidence": 0.8, "rights_severity": 0.2}},
        {"type": "enrich", "method": "document_enrich",
         "collection": "rights-database"},
    ]
)

Stages: audio fingerprint match → visual similarity → confidence sort → rights enrichment Result: Potential IP violations with licensing context attached

Reverse Image Search

Find visually similar content across a media library with deduplication.

retriever = mp.retrievers.create(
    name="reverse-image",
    namespace="catalog",
    stages=[
        {"type": "filter", "method": "feature_search",
         "feature_name": "visual.embedding",
         "query": query_image_embedding, "threshold": 0.7},
        {"type": "filter", "method": "metadata",
         "where": {"source": {"$ne": original_source}}},
        {"type": "reduce", "method": "dedup",
         "field": "source_url"},
        {"type": "reduce", "method": "sampling", "limit": 20},
    ]
)

Stages: visual similarity → exclude self-matches → deduplicate by source → top-20 Result: 20 unique visually similar items from different sources

Content Moderation

Flag unsafe content across multiple safety dimensions.

retriever = mp.retrievers.create(
    name="content-moderation",
    namespace="user-uploads",
    stages=[
        {"type": "filter", "method": "feature_search",
         "feature_name": "safety.nsfw",
         "threshold": 0.6},
        {"type": "filter", "method": "feature_search",
         "feature_name": "safety.violence",
         "threshold": 0.5},
        {"type": "sort", "method": "score_linear",
         "weights": {"nsfw_score": 0.5, "violence_score": 0.5}},
        {"type": "apply", "method": "webhook",
         "url": "https://moderation.internal/review-queue"},
    ]
)

Stages: NSFW filter → violence filter → combined risk sort → send to review queue Result: Flagged content sent to human review, sorted by severity

Document Q&A

Answer questions across a document corpus with citation tracking.

retriever = mp.retrievers.create(
    name="document-qa",
    namespace="legal-docs",
    stages=[
        {"type": "filter", "method": "feature_search",
         "feature_name": "text.semantic",
         "query": "{{INPUT.question}}"},
        {"type": "filter", "method": "metadata",
         "where": {"doc_type": {"$in": ["contract", "policy"]}}},
        {"type": "sort", "method": "cross_encoder_rerank",
         "model": "bge-reranker-v2-m3"},
        {"type": "reduce", "method": "sampling", "limit": 5},
        {"type": "enrich", "method": "document_enrich",
         "collection": "document-metadata"},
    ]
)

Stages: semantic search → filter by doc type → rerank → top-5 → attach doc metadata Result: 5 most relevant passages with source document, page number, and classification

Duplicate Detection

Find near-duplicates across a massive media library.

retriever = mp.retrievers.create(
    name="dedup-scanner",
    namespace="media-archive",
    stages=[
        {"type": "filter", "method": "feature_search",
         "feature_name": "visual.perceptual_hash",
         "query": target_hash, "threshold": 0.85},
        {"type": "filter", "method": "metadata",
         "where": {"ingested_after": "2026-01-01"}},
        {"type": "reduce", "method": "dedup",
         "field": "source_url"},
    ]
)

Stages: perceptual hash similarity → date filter → deduplicate Result: Unique near-duplicate items ingested in the target time range

Contextual Ad Targeting

IAB category classification for contextual advertising without cookies.

retriever = mp.retrievers.create(
    name="contextual-targeting",
    namespace="publisher-content",
    stages=[
        {"type": "filter", "method": "feature_search",
         "feature_name": "multimodal.semantic",
         "query": ad_campaign_description},
        {"type": "filter", "method": "metadata",
         "where": {"iab_category": {"$in": target_categories}}},
        {"type": "sort", "method": "score_linear",
         "weights": {"relevance": 0.7, "recency": 0.2, "engagement": 0.1}},
        {"type": "reduce", "method": "sampling", "limit": 50},
    ]
)

Stages: semantic relevance → IAB category filter → weighted scoring → top-50 Result: 50 most relevant content placements for the ad campaign

Track brand mentions across video and audio content in real-time.

retriever = mp.retrievers.create(
    name="brand-monitor",
    namespace="social-feeds",
    stages=[
        {"type": "filter", "method": "feature_search",
         "feature_name": "logo.visual",
         "query": brand_logo_embedding},
        {"type": "filter", "method": "feature_search",
         "feature_name": "audio.speech",
         "query": "brand name mention"},
        {"type": "sort", "method": "score_linear",
         "weights": {"engagement": 0.5, "sentiment": 0.3, "reach": 0.2}},
        {"type": "reduce", "method": "sampling", "limit": 25},
        {"type": "apply", "method": "webhook",
         "url": "https://social.internal/brand-alerts"},
    ]
)

Stages: logo detection → speech mention → engagement sort → top-25 → alert Result: 25 highest-impact brand mentions with alerts to the social team

Counterfeit Detection with Supabase Writeback

Ingest suspect marketplace listings from a Supabase database, match against a brand catalog, classify with an LLM, and write the verdict back to the source row.

retriever = mp.retrievers.create(
    name="counterfeit-detector",
    namespace="brand-protection",
    stages=[
        {"type": "filter", "method": "feature_search",
         "feature_name": "image_extractor_v1_embedding",
         "query_input": "image",
         "collection": "brand-catalog-embeddings",
         "top_k": 10},
        {"type": "sort", "method": "rerank",
         "limit": 5},
        {"type": "enrich", "method": "llm_enrich",
         "model": "gpt-4o-mini",
         "multimodal_inputs": {"suspect_image": "image"},
         "output_schema": {
             "classification": "COUNTERFEIT|DUPE|LEGIT_RESALE|UNRELATED",
             "confidence": "float",
             "reasoning": "string"
         }},
    ]
)

Stages: visual similarity search → precision rerank → multimodal LLM verdict Result: Each suspect listing classified with confidence score and reasoning To scan a batch of suspect images at once, use batch execution:

results = mp.retrievers.execute_batch(
    retriever_id="counterfeit-detector",
    queries=[{"inputs": {"image": url}} for url in suspect_image_urls],
    concurrency=10,
    stream=True,
)

Pattern: Combining Recipes

These recipes compose. A common pattern is to build a base pipeline and extend it:

# Start with brand safety scanner
base_stages = [face_filter, logo_filter, sentiment_sort, reduce_10]

# Add IP clearance
full_stages = base_stages + [audio_fingerprint_filter, rights_enrich]

# Add automated alerting
monitored_stages = full_stages + [slack_webhook_apply]

Each stage is independent. Add, remove, or reorder them to match your use case.

​Brand Safety Scanner

​Multimodal RAG

​IP Clearance Pipeline

​Reverse Image Search

​Content Moderation

​Document Q&A

​Duplicate Detection

​Contextual Ad Targeting

​Social Media Monitoring

​Counterfeit Detection with Supabase Writeback

​Pattern: Combining Recipes