Multimodal Recipes
Ready-to-use solutions for building multimodal applications
Documents where a chart contradicts a text claim
Identify documents where data visualizations in charts conflict with statements made in nearby text. Useful for fact-checking reports in finance, healthcare, and journalism.
Clips with a specific object and a related spoken keyword
Search for clips where a specific object is visible while a related keyword is spoken. This combines object detection with speech-to-text and keyword analysis.
Clips where someone is talking but no person is visible
Use speech-to-text to detect narration or dialogue and cross-reference with object detection to ensure no person is visually present in the clips.
Scenes with fast movement, loud sounds, and dim lighting
Identify high-action scenes by combining action recognition with sound event detection for loud noises and scene classification for lighting conditions.
Moments where a person gestures while speaking a command word
Pinpoint interactive moments by detecting specific physical gestures alongside key spoken command words using action recognition and speech-to-text.
On-screen text with narration and background music
Detect scenes where on-screen text, human narration, and background music occur simultaneously using OCR, speech-to-text, and audio classification.
Segments with angry expressions and negative phrases
Find moments of conflict or frustration by analyzing facial expressions for anger, and cross-referencing with negative keywords from the transcript.
Frames with multiple people arguing and high visual activity
Isolate heated moments by identifying multiple speakers arguing through speaker diarization and audio event detection, combined with high visual activity from action recognition.
What are Mixpeek Recipes?
Mixpeek recipes are practical blueprints for multimodal search. They demonstrate how to combine multiple feature extractors to answer complex, high-value questions that are impossible with traditional search methods.
Composable Blueprints
Each recipe provides a practical blueprint, showing how to combine multiple feature extractors to answer complex, real-world questions across your data.
Unlock Multimodal Search
Go beyond simple keyword search. Recipes demonstrate how to query across modalities—like matching spoken words with visual elements—to find precise moments.
Practical & Actionable
Get inspired by real-world use cases. From fact-checking financial reports to analyzing video content, recipes provide actionable patterns you can adapt and use today.