
Turn every Mux video into a searchable, intelligent asset
Sync videos from Mux into Mixpeek for automatic multimodal extraction — scene understanding, object detection, face identity, OCR, and transcription. Build visual search retrievers that let users find the exact frame, scene, or spoken word across your entire video library.

Video platforms store thousands of hours of content, but the footage itself is a black box. Finding a specific scene, verifying talent rights across a library, or searching for on-screen text means scrubbing through videos manually. Metadata is limited to what was entered at upload time — titles and tags that go stale fast. Teams waste hours on manual review that should take seconds.
Mixpeek connects directly to Mux via selective sync. When a video lands in Mux, Mixpeek automatically decomposes it into frames and audio segments, then runs multimodal extractors — visual embeddings, object detection, face recognition, OCR, and speech transcription. Every extracted feature is indexed into a retriever so your team can search across scenes, objects, spoken words, and on-screen text from a single query.
What teams see after connecting Mux to Mixpeek
95% reduction in manual review time
teams find the exact frame in seconds instead of scrubbing through hours of footage
Zero manual indexing
every Mux upload is decomposed and searchable within minutes, no human intervention
40–60% lower processing costs
selective sync filters ensure only relevant assets are indexed, eliminating waste
Sub-second search across 10,000+ hours of video
visual, face, transcript, and OCR queries return in <200ms
Complete compliance lineage
every extraction step is logged from Mux ingest to search index, audit-ready out of the box
Same-day integration
connect Mux, configure filters, and run your first search query in under 4 hours
Hover over each step to see how the components connect
Mux Selective Sync
Webhook + Filters
Videos uploaded to Mux trigger a webhook. Selective sync filters decide which assets flow into Mixpeek based on metadata, passthrough flags, or asset tags.
Multimodal Decomposition
Extractors
Each video is decomposed into frames and audio segments. Extractors run in parallel: visual embeddings, object detection, face identity, OCR, and speech transcription.
Feature Indexing
Collections
Extracted features are stored in Mixpeek collections with full lineage back to the source Mux asset, timestamp, and frame number.
Visual Search Retriever
Feature Search + Filters
A retriever combines vector similarity, face identity matching, metadata filters, and full-text search across transcripts and OCR output.
Audit Trail
Batch Processing
Every pipeline step is logged — from Mux webhook receipt through extraction completion — providing full observability and compliance lineage.
Selective sync lets you control exactly which Mux assets flow into Mixpeek using metadata filters and passthrough flags. When a video is uploaded to Mux with the right metadata, a webhook fires and Mixpeek pulls the asset automatically. RAW formats (RED R3D, ARRI RAW) are converted via custom plugins before extraction. The pipeline decomposes each video into scene compositions, detected objects, recognized faces, on-screen text, and transcribed speech — then indexes everything into a visual search retriever with feature search, face identity, and full-text stages. An audit trail tracks every step from ingest to searchable index.
Get started with Mixpeek + Mux in minutes. Read the docs, create a free account, or schedule a walkthrough with our team.