Video understanding demonstrates the full warehouse flow: Decompose video into scenes, faces, and speech; Store features across tiers; Reassemble answers through retrieval pipelines.
How It Works
When you ingest a video, Mixpeek runs a multi-stage pipeline:
- Chunking — Videos split into segments using scene detection, silence detection, or fixed intervals
- Parallel Extraction — Multiple extractors run concurrently:
- Transcription: Whisper extracts speech-to-text with timestamps
- Visual Embeddings: Multimodal model generates embeddings from keyframes
- Thumbnails: Representative frames extracted for each segment
- Description & OCR — Gemini generates segment descriptions and extracts on-screen text
- Multi-Vector Indexing — Separate embeddings for transcription and visual content enable hybrid search
At query time, the retriever searches across both visual and transcript embeddings, fusing results to find moments by what’s shown or what’s said.
| Extractor | Outputs |
|---|
video_extractor@v1 | Scene embeddings, keyframes, timestamps |
audio_extractor@v1 | Transcription, speaker diarization |
text_extractor@v1 | Text embeddings, OCR from frames |
face_extractor@v1 | Face embeddings, bounding boxes |
1. Create a Bucket
POST /v1/buckets
{
"bucket_name": "video-catalog",
"schema": {
"properties": {
"video_url": { "type": "url", "required": true },
"title": { "type": "text" },
"category": { "type": "text" }
}
}
}
2. Create Collections
For scenes:
POST /v1/collections
{
"collection_name": "video-scenes",
"source": { "type": "bucket", "bucket_id": "bkt_videos" },
"feature_extractor": {
"feature_extractor_name": "video_extractor",
"version": "v1",
"input_mappings": { "video_url": "video_url" },
"parameters": {
"scene_detection_threshold": 0.3,
"keyframe_interval": 30,
"max_scenes": 100
},
"field_passthrough": [
{ "source_path": "title" },
{ "source_path": "category" }
]
}
}
For transcripts:
POST /v1/collections
{
"collection_name": "video-transcripts",
"source": { "type": "bucket", "bucket_id": "bkt_videos" },
"feature_extractor": {
"feature_extractor_name": "audio_extractor",
"version": "v1",
"input_mappings": { "audio_url": "video_url" },
"parameters": {
"transcription_model": "whisper-large-v3",
"language": "en",
"enable_diarization": true
},
"field_passthrough": [
{ "source_path": "title" },
{ "source_path": "category" }
]
}
}
3. Ingest Videos
POST /v1/buckets/{bucket_id}/objects
{
"key_prefix": "/marketing/demos",
"metadata": {
"title": "Product Launch Q4 2025",
"category": "marketing"
},
"blobs": [
{
"property": "video_url",
"type": "video",
"url": "s3://my-bucket/demos/product-launch.mp4"
}
]
}
4. Process
POST /v1/buckets/{bucket_id}/batches
{ "object_ids": ["obj_video_001"] }
POST /v1/buckets/{bucket_id}/batches/{batch_id}/submit
5. Create a Hybrid Retriever
Combine visual and transcript search:
POST /v1/retrievers
{
"retriever_name": "video-search",
"collection_ids": ["col_video_scenes", "col_video_transcripts"],
"input_schema": {
"properties": {
"query_text": { "type": "text", "required": true },
"query_image": { "type": "url" },
"category": { "type": "text" }
}
},
"stages": [
{
"stage_name": "hybrid_search",
"version": "v1",
"parameters": {
"queries": [
{
"feature_address": "mixpeek://video_extractor@v1/scene_embedding",
"input_mapping": { "image": "query_image" },
"weight": 0.6
},
{
"feature_address": "mixpeek://audio_extractor@v1/transcript_embedding",
"input_mapping": { "text": "query_text" },
"weight": 0.4
}
],
"fusion_method": "rrf",
"limit": 20
}
},
{
"stage_name": "filter",
"version": "v1",
"parameters": {
"filters": {
"field": "metadata.category",
"operator": "eq",
"value": "{{inputs.category}}"
}
}
}
]
}
6. Search
Text query:
POST /v1/retrievers/{retriever_id}/execute
{
"inputs": {
"query_text": "someone explaining product features",
"category": "marketing"
},
"limit": 10
}
Image query (find similar scenes):
POST /v1/retrievers/{retriever_id}/execute
{
"inputs": {
"query_image": "s3://my-bucket/reference-scene.jpg",
"query_text": "product demonstration"
},
"limit": 10
}
Moment-Level Search
Filter by timestamp:
POST /v1/retrievers/{retriever_id}/execute
{
"inputs": { "query_text": "pricing discussion" },
"filters": {
"field": "segment_metadata.start_time",
"operator": "gte",
"value": 60.0
}
}
Speaker-Specific Search
With diarization enabled:
{
"filters": {
"field": "metadata.speaker_id",
"operator": "eq",
"value": "SPEAKER_001"
}
}
Output Example
Scene document from video_extractor@v1:
{
"document_id": "doc_scene_123",
"source_object_id": "obj_video_001",
"metadata": {
"title": "Product Launch Q4 2025",
"scene_index": 3,
"start_time": 45.2,
"end_time": 58.7,
"keyframe_url": "s3://my-bucket/keyframes/scene_003.jpg"
}
}
Parameters
| Parameter | Effect |
|---|
scene_detection_threshold | Lower = more scenes (0.2-0.5) |
keyframe_interval | Seconds between keyframes |
max_scenes | Cap scenes per video |
transcription_model | whisper-base (fast) or whisper-large-v3 (accurate) |
Classify with Taxonomies
Auto-tag video segments by content type (e.g., “product demo”, “interview”, “presentation”):
POST /v1/taxonomies
{
"taxonomy_name": "video-content-types",
"taxonomy_type": "hierarchical",
"retriever_id": "ret_video_search",
"collection_id": "col_video_scenes",
"input_mappings": [{ "source": "blob.keyframe_url", "target": "query_image" }],
"enrichment_fields": [
{ "source": "payload.category", "target": "content_type" },
{ "source": "payload.title", "target": "series_name" }
],
"threshold": 0.65,
"execution_mode": "materialize"
}
New videos automatically get content_type labels based on visual similarity to reference segments. See Taxonomies for execution modes.
Discover Clusters
Find recurring visual themes across your video library:
POST /v1/clusters
{
"cluster_name": "scene-themes",
"collection_id": "col_video_scenes",
"feature_uri": "mixpeek://video_extractor@v1/scene_embedding",
"algorithm": { "name": "hdbscan", "params": { "min_cluster_size": 8 } },
"llm_labeling": {
"enabled": true,
"input_mappings": [{ "source": "blob", "fields": ["keyframe_url"] }]
},
"dimension_reduction": { "method": "umap", "n_components": 2 }
}
Clusters reveal patterns like “whiteboard sessions”, “outdoor shots”, “screen recordings” automatically. Promote stable clusters to taxonomy nodes. See Clusters.
Set Up Alerts
Get notified when new video content matches specific criteria:
POST /v1/alerts
{
"alert_name": "competitor-mention",
"collection_id": "col_video_transcripts",
"condition": { "field": "metadata.transcript_text", "operator": "contains", "value": "competitor" },
"notification": { "type": "webhook", "url": "https://example.com/webhook" }
}
Set Up Webhooks
Track video processing progress for large batch uploads:
POST /v1/webhooks
{
"webhook_name": "video-pipeline",
"url": "https://example.com/webhook",
"events": ["batch.completed", "batch.failed", "document.created"]
}