Mixpeek Logo
    Schedule Demo

    Multimodal Recipes

    Ready-to-use solutions for building multimodal applications

    All Modalities
    Image
    Video
    Audio
    Text
    Document

    Documents where a chart contradicts a text claim

    Identify documents where data visualizations in charts conflict with statements made in nearby text. Useful for fact-checking reports in finance, healthcare, and journalism.

    Modalities:
    document
    Feature Extractors:
    pdf-extraction
    chart-graph-extraction
    ...

    Clips with a specific object and a related spoken keyword

    Search for clips where a specific object is visible while a related keyword is spoken. This combines object detection with speech-to-text and keyword analysis.

    Modalities:
    video
    audio
    Feature Extractors:
    object-detection
    video-transcription
    ...

    Clips where someone is talking but no person is visible

    Use speech-to-text to detect narration or dialogue and cross-reference with object detection to ensure no person is visually present in the clips.

    Modalities:
    video
    audio
    Feature Extractors:
    video-transcription
    object-detection

    Scenes with fast movement, loud sounds, and dim lighting

    Identify high-action scenes by combining action recognition with sound event detection for loud noises and scene classification for lighting conditions.

    Modalities:
    video
    audio
    Feature Extractors:
    action-recognition
    audio-event-detection
    ...

    Moments where a person gestures while speaking a command word

    Pinpoint interactive moments by detecting specific physical gestures alongside key spoken command words using action recognition and speech-to-text.

    Modalities:
    video
    audio
    Feature Extractors:
    action-recognition
    video-transcription

    On-screen text with narration and background music

    Detect scenes where on-screen text, human narration, and background music occur simultaneously using OCR, speech-to-text, and audio classification.

    Modalities:
    video
    audio
    text
    Feature Extractors:
    image-text-extraction
    video-transcription
    ...

    Segments with angry expressions and negative phrases

    Find moments of conflict or frustration by analyzing facial expressions for anger, and cross-referencing with negative keywords from the transcript.

    Modalities:
    video
    audio
    Feature Extractors:
    face-grouping
    keyword-extraction
    ...

    Frames with multiple people arguing and high visual activity

    Isolate heated moments by identifying multiple speakers arguing through speaker diarization and audio event detection, combined with high visual activity from action recognition.

    Modalities:
    video
    audio
    Feature Extractors:
    speaker-diarization
    audio-event-detection
    ...

    What are Mixpeek Recipes?

    Mixpeek recipes are practical blueprints for multimodal search. They demonstrate how to combine multiple feature extractors to answer complex, high-value questions that are impossible with traditional search methods.

    Composable Blueprints

    Each recipe provides a practical blueprint, showing how to combine multiple feature extractors to answer complex, real-world questions across your data.

    Unlock Multimodal Search

    Go beyond simple keyword search. Recipes demonstrate how to query across modalities—like matching spoken words with visual elements—to find precise moments.

    Practical & Actionable

    Get inspired by real-world use cases. From fact-checking financial reports to analyzing video content, recipes provide actionable patterns you can adapt and use today.