Video Understanding

Video understanding demonstrates the full warehouse flow: Decompose video into scenes, faces, and speech; Store features across tiers; Reassemble answers through retrieval pipelines.

How It Works

When you ingest a video, Mixpeek runs a multi-stage pipeline:

Chunking — Videos split into segments using scene detection, silence detection, or fixed intervals
Parallel Extraction — Multiple extractors run concurrently:
- Transcription: Whisper extracts speech-to-text with timestamps
- Visual Embeddings: Multimodal model generates embeddings from keyframes
- Thumbnails: Representative frames extracted for each segment
Description & OCR — Gemini generates segment descriptions and extracts on-screen text
Multi-Vector Indexing — Separate embeddings for transcription and visual content enable hybrid search

At query time, the retriever searches across both visual and transcript embeddings, fusing results to find moments by what’s shown or what’s said.

Feature Extractors

Extractor	Outputs
`video_extractor@v1`	Scene embeddings, keyframes, timestamps
`audio_extractor@v1`	Transcription, speaker diarization
`text_extractor@v1`	Text embeddings, OCR from frames
`face_extractor@v1`	Face embeddings, bounding boxes

1. Create a Bucket

POST /v1/buckets
{
  "bucket_name": "video-catalog",
  "schema": {
    "properties": {
      "video_url": { "type": "url", "required": true },
      "title": { "type": "text" },
      "category": { "type": "text" }
    }
  }
}

2. Create Collections

For scenes:

POST /v1/collections
{
  "collection_name": "video-scenes",
  "source": { "type": "bucket", "bucket_id": "bkt_videos" },
  "feature_extractor": {
    "feature_extractor_name": "video_extractor",
    "version": "v1",
    "input_mappings": { "video_url": "video_url" },
    "parameters": {
      "scene_detection_threshold": 0.3,
      "keyframe_interval": 30,
      "max_scenes": 100
    },
    "field_passthrough": [
      { "source_path": "title" },
      { "source_path": "category" }
    ]
  }
}

For transcripts:

POST /v1/collections
{
  "collection_name": "video-transcripts",
  "source": { "type": "bucket", "bucket_id": "bkt_videos" },
  "feature_extractor": {
    "feature_extractor_name": "audio_extractor",
    "version": "v1",
    "input_mappings": { "audio_url": "video_url" },
    "parameters": {
      "transcription_model": "whisper-large-v3",
      "language": "en",
      "enable_diarization": true
    },
    "field_passthrough": [
      { "source_path": "title" },
      { "source_path": "category" }
    ]
  }
}

3. Ingest Videos

POST /v1/buckets/{bucket_id}/objects
{
  "key_prefix": "/marketing/demos",
  "metadata": {
    "title": "Product Launch Q4 2025",
    "category": "marketing"
  },
  "blobs": [
    {
      "property": "video_url",
      "type": "video",
      "url": "s3://my-bucket/demos/product-launch.mp4"
    }
  ]
}

4. Process

POST /v1/buckets/{bucket_id}/batches
{ "object_ids": ["obj_video_001"] }

POST /v1/buckets/{bucket_id}/batches/{batch_id}/submit

5. Create a Hybrid Retriever

Combine visual and transcript search:

POST /v1/retrievers
{
  "retriever_name": "video-search",
  "collection_ids": ["col_video_scenes", "col_video_transcripts"],
  "input_schema": {
    "properties": {
      "query_text": { "type": "text", "required": true },
      "query_image": { "type": "url" },
      "category": { "type": "text" }
    }
  },
  "stages": [
    {
      "stage_name": "hybrid_search",
      "version": "v1",
      "parameters": {
        "queries": [
          {
            "feature_address": "mixpeek://video_extractor@v1/scene_embedding",
            "input_mapping": { "image": "query_image" },
            "weight": 0.6
          },
          {
            "feature_address": "mixpeek://audio_extractor@v1/transcript_embedding",
            "input_mapping": { "text": "query_text" },
            "weight": 0.4
          }
        ],
        "fusion_method": "rrf",
        "limit": 20
      }
    },
    {
      "stage_name": "filter",
      "version": "v1",
      "parameters": {
        "filters": {
          "field": "metadata.category",
          "operator": "eq",
          "value": "{{inputs.category}}"
        }
      }
    }
  ]
}

6. Search

Text query:

POST /v1/retrievers/{retriever_id}/execute
{
  "inputs": {
    "query_text": "someone explaining product features",
    "category": "marketing"
  },
  "limit": 10
}

Image query (find similar scenes):

POST /v1/retrievers/{retriever_id}/execute
{
  "inputs": {
    "query_image": "s3://my-bucket/reference-scene.jpg",
    "query_text": "product demonstration"
  },
  "limit": 10
}

Moment-Level Search

Filter by timestamp:

POST /v1/retrievers/{retriever_id}/execute
{
  "inputs": { "query_text": "pricing discussion" },
  "filters": {
    "field": "segment_metadata.start_time",
    "operator": "gte",
    "value": 60.0
  }
}

Speaker-Specific Search

With diarization enabled:

{
  "filters": {
    "field": "metadata.speaker_id",
    "operator": "eq",
    "value": "SPEAKER_001"
  }
}

Output Example

Scene document from video_extractor@v1:

{
  "document_id": "doc_scene_123",
  "source_object_id": "obj_video_001",
  "metadata": {
    "title": "Product Launch Q4 2025",
    "scene_index": 3,
    "start_time": 45.2,
    "end_time": 58.7,
    "keyframe_url": "s3://my-bucket/keyframes/scene_003.jpg"
  }
}

Parameters

Parameter	Effect
`scene_detection_threshold`	Lower = more scenes (0.2-0.5)
`keyframe_interval`	Seconds between keyframes
`max_scenes`	Cap scenes per video
`transcription_model`	`whisper-base` (fast) or `whisper-large-v3` (accurate)

Classify with Taxonomies

Auto-tag video segments by content type (e.g., “product demo”, “interview”, “presentation”):

POST /v1/taxonomies
{
  "taxonomy_name": "video-content-types",
  "taxonomy_type": "hierarchical",
  "retriever_id": "ret_video_search",
  "collection_id": "col_video_scenes",
  "input_mappings": [{ "source": "blob.keyframe_url", "target": "query_image" }],
  "enrichment_fields": [
    { "source": "payload.category", "target": "content_type" },
    { "source": "payload.title", "target": "series_name" }
  ],
  "threshold": 0.65,
  "execution_mode": "materialize"
}

New videos automatically get content_type labels based on visual similarity to reference segments. See Taxonomies for execution modes.

Discover Clusters

Find recurring visual themes across your video library:

POST /v1/clusters
{
  "cluster_name": "scene-themes",
  "collection_id": "col_video_scenes",
  "feature_uri": "mixpeek://video_extractor@v1/scene_embedding",
  "algorithm": { "name": "hdbscan", "params": { "min_cluster_size": 8 } },
  "llm_labeling": {
    "enabled": true,
    "input_mappings": [{ "source": "blob", "fields": ["keyframe_url"] }]
  },
  "dimension_reduction": { "method": "umap", "n_components": 2 }
}

Clusters reveal patterns like “whiteboard sessions”, “outdoor shots”, “screen recordings” automatically. Promote stable clusters to taxonomy nodes. See Clusters.

Set Up Alerts

Get notified when new video content matches specific criteria:

POST /v1/alerts
{
  "alert_name": "competitor-mention",
  "collection_id": "col_video_transcripts",
  "condition": { "field": "metadata.transcript_text", "operator": "contains", "value": "competitor" },
  "notification": { "type": "webhook", "url": "https://example.com/webhook" }
}

Set Up Webhooks

Track video processing progress for large batch uploads:

POST /v1/webhooks
{
  "webhook_name": "video-pipeline",
  "url": "https://example.com/webhook",
  "events": ["batch.completed", "batch.failed", "document.created"]
}

Tutorials

Documentation Index

​How It Works

​Feature Extractors

​1. Create a Bucket

​2. Create Collections

​3. Ingest Videos

​4. Process

​5. Create a Hybrid Retriever

​6. Search

​Moment-Level Search

​Speaker-Specific Search

​Output Example

​Parameters

​Classify with Taxonomies

​Discover Clusters

​Set Up Alerts

​Set Up Webhooks

How It Works

Feature Extractors

1. Create a Bucket

2. Create Collections

3. Ingest Videos

4. Process

5. Create a Hybrid Retriever

6. Search

Moment-Level Search

Speaker-Specific Search

Output Example

Parameters

Classify with Taxonomies

Discover Clusters

Set Up Alerts

Set Up Webhooks