NEWAgents can now see video via MCP.Try it now →

    Perception for agents
    across video, images,
    audio & documents.

    Mixpeek decomposes unstructured media into typed features, reassembles them through multi-stage retrievers, and enriches results with your domain taxonomies. Your agents can now see, hear, and act on what was previously dark data.

    Built by experts from
    MongoDBBerkeleyNVIDIAEtsyAmazon Web ServicesEquinixIAB Tech LabMongoDBBerkeleyNVIDIAEtsyAmazon Web ServicesEquinixIAB Tech Lab
    Live retriever · Talent search across 10k video ads
    Sources4
    Super Bowl adss3://mxp-ads/2026/*.mp4
    Creator headshots42k reference faces
    Casting databaseconflicts + rates
    Outtake reelsagency archive
    mixpeek://core47ms
    Feature Extractors
    Facearcface-v2
    face_boxface_embeddingidentity
    Sceneclip-vit-l
    scene_embeddingscene_id
    Transcriptwhisper-v3
    transcriptlanguage
    detectembedmatchfilterrank
    10,482 ads indexed14 feat/file
    Retrievers4
    Face searchfind talent across ads
    Conflict detectionbrand competitors
    Utilization reportby creator / quarter
    Scene lookupfind the exact moment
    Files indexed
    2.4M+
    Video, images, audio, and documents processed across production deployments.
    Features extracted
    34M+
    Embeddings, faces, transcripts, logos, scenes, and fingerprints pulled from raw media.
    Retriever stages
    6per query
    Chain search, rerank, filter, aggregate, and classify in a single retriever call.
    Extractors
    12built-in
    Face, scene, transcript, OCR, logo, audio, object, and more. No third-party contracts.
    Capabilities

    One API, every modality.

    Three primitives

    Decompose. Reassemble. Enrich.

    Read the docs →
    decompose.py
    # Break raw media into typed, versioned features.
    from mixpeek import Mixpeek
    from mixpeek.extractors import Face, Scene, Transcript
     
    mp = Mixpeek("YOUR_API_KEY")
     
    collection = mp.collections.create(
    source="s3://mxp-ads/*.mp4",
    features=[
    Face(model="arcface-v2"),
    Scene(model="clip-vit-l"),
    Transcript(model="whisper-v3"),
    ],
    )
    # 14 features per file · embeddings ready in seconds
    Timeline

    Millions of files, no pipeline
    to maintain

    A seeing agent shouldn't take a quarter to ship.

    Try it today →
    Day 1

    One agent, one modality.

    Point Mixpeek at a single S3 bucket. Your agent can query faces in 10,000 video ads within an hour. No pipeline, no infra team.

    Week 1

    Every modality, every team.

    Roll out to brand, comms, legal. Agents now cross faces, logos, transcripts, and scenes in one query. No separate vendor per feature.

    Month 1

    Autonomous retrieval.

    Agents operate on millions of files via MCP. Compliance runs every 15 min. Brand protection files its own takedowns. You review, not scan.

    Real workflows

    In production right now.

    query-face.jpg4 MATCHES · 12 FRAMES · 47MS
    Super Bowl corpus · live demo

    Talent search across ads

    Upload a photo, find every ad that creator appeared in. The same pipeline a performance marketing agency has run in production for 12 months.

    Try face search →
    NLOGO!! 3 VIOLATIONS DETECTED• Unlicensed logo at 00:04• Copyrighted audio at 00:12• Uncleared face at 00:18
    IP detection · live demo

    Copyright & logo matching

    Scan video for logo, face, and audio fingerprint matches before publish. One API call replaces three vendor contracts.

    Try copyright detection →
    MOOD · warm · dreamy · analog
    Visual taste · live demo

    Scene similarity recs

    Rate a few films. Get recs based on how scenes look,not how someone tagged them. Thompson Sampling learns your taste in real time.

    Try the taste engine →