Skip to main content
Mixpeek decomposes video into searchable segments and extracts faces, visual embeddings, transcripts, scene descriptions, and structured metadata. Each segment becomes a document with its own vector indexes, so you can search within video at the sub-clip level.

What Gets Extracted

FeatureModelDimensionsExtractor
Visual embeddingsVertex AI multimodal1408Dmultimodal_extractor
Audio transcriptWhispermultimodal_extractor
Transcript embeddingsE5-Large1024Dmultimodal_extractor
Scene descriptionsGeminimultimodal_extractor
OCR (on-screen text)Geminimultimodal_extractor
Face embeddingsArcFace (SCRFD detect)512Dface_identity_extractor
Learning units (lectures)E5-Large + Jina Code + SigLIP1024D / 768Dcourse_content_extractor
Temporal segmentsFFmpeg (time / scene / silence)multimodal_extractor

Choosing an Extractor

GoalExtractorWhy
General video search (visual + spoken content)multimodal_extractorUnified embedding space across video, image, and text
Face recognition / identity matchingface_identity_extractor512D ArcFace embeddings with 99.8% verification accuracy
Educational content (lectures, slides, code)course_content_extractorAtomic learning units with text, code, and visual embeddings

Create a Collection for Video

This collection splits video into 10-second segments, transcribes audio, and generates visual + transcript embeddings.
curl -X POST https://api.mixpeek.com/v1/collections \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "collection_name": "video-library",
    "source": { "type": "bucket", "bucket_id": "bkt_videos" },
    "feature_extractor": {
      "feature_extractor_name": "multimodal_extractor",
      "version": "v1",
      "input_mappings": {
        "video": "payload.video_url"
      },
      "field_passthrough": [
        { "source_path": "metadata.title" },
        { "source_path": "metadata.category" }
      ],
      "parameters": {
        "split_method": "scene",
        "scene_detection_threshold": 0.5,
        "run_transcription": true,
        "run_transcription_embedding": true,
        "run_multimodal_embedding": true,
        "run_video_description": true,
        "enable_thumbnails": true
      }
    }
  }'

Search by Visual Content

Create a retriever that searches video segments by visual similarity. A text query like “person writing on whiteboard” finds visually matching segments through Vertex AI’s cross-modal embedding space.
curl -X POST https://api.mixpeek.com/v1/retrievers \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "retriever_name": "video-visual-search",
    "collection_ids": ["col_video_library"],
    "input_schema": {
      "properties": {
        "query": { "type": "text", "required": true }
      }
    },
    "stages": [
      {
        "stage_name": "visual_search",
        "stage_type": "filter",
        "config": {
          "stage_id": "feature_search",
          "parameters": {
            "query": "{{INPUT.query}}",
            "top_k": 20
          }
        }
      }
    ]
  }'
Execute the retriever:
curl -X POST https://api.mixpeek.com/v1/retrievers/ret_abc123/execute \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": { "query": "person writing on whiteboard" },
    "limit": 10
  }'

Search by Transcript

To search spoken content, create a retriever that targets the transcription embedding index.
curl -X POST https://api.mixpeek.com/v1/retrievers \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "retriever_name": "video-transcript-search",
    "collection_ids": ["col_video_library"],
    "input_schema": {
      "properties": {
        "query": { "type": "text", "required": true }
      }
    },
    "stages": [
      {
        "stage_name": "transcript_search",
        "stage_type": "filter",
        "config": {
          "stage_id": "feature_search",
          "parameters": {
            "feature_address": "mixpeek://multimodal_extractor@v1/transcription_embedding",
            "input_mapping": { "text": "query" },
            "query": "{{INPUT.query}}",
            "top_k": 20
          }
        }
      }
    ]
  }'

Search by Face

Use a separate collection with face_identity_extractor to find video segments containing a specific person.
curl -X POST https://api.mixpeek.com/v1/collections \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "collection_name": "video-faces",
    "source": { "type": "bucket", "bucket_id": "bkt_videos" },
    "feature_extractor": {
      "feature_extractor_name": "face_identity_extractor",
      "version": "v1",
      "input_mappings": {
        "video": "payload.video_url"
      },
      "parameters": {
        "detection_model": "scrfd_2.5g",
        "embedding_model": "arcface_r100",
        "video_sampling_fps": 1.0,
        "video_deduplication": true,
        "video_deduplication_threshold": 0.8
      }
    }
  }'

Example: Casting Intelligence Across Ad Creatives

A performance marketing agency indexes hundreds of video ads with face_identity_extractor to track which talent appears in which campaigns. The retriever returns face matches with timestamps and confidence scores, enabling queries like “find every ad featuring this creator” or “has this person appeared in a competitor’s campaign?” Face identity document (per detected face):
{
  "document_id": "doc_face_8f2a",
  "source_video_url": "s3://ad-archive/campaigns/summer-2026/outdoor-spot-04.mp4",
  "start_time": 3.2,
  "end_time": 8.7,
  "thumbnail_url": "s3://mixpeek-storage/ns_casting/thumbnails/face_8f2a.jpg",
  "metadata": {
    "campaign": "Summer 2026 Outdoor",
    "ad_id": "ad_04",
    "platform": "meta"
  },
  "face_identity_extractor_v1_face_embedding": [0.112, -0.045, "...512 floats"],
  "face_bbox": { "x": 0.32, "y": 0.15, "width": 0.18, "height": 0.24 },
  "face_confidence": 0.97
}
Retriever execution result (querying by face similarity):
{
  "results": [
    {
      "document_id": "doc_face_8f2a",
      "score": 0.94,
      "metadata": {
        "campaign": "Summer 2026 Outdoor",
        "ad_id": "ad_04",
        "platform": "meta",
        "start_time": 3.2,
        "end_time": 8.7,
        "source_video_url": "s3://ad-archive/campaigns/summer-2026/outdoor-spot-04.mp4",
        "thumbnail_url": "s3://mixpeek-storage/ns_casting/thumbnails/face_8f2a.jpg"
      }
    },
    {
      "document_id": "doc_face_c91b",
      "score": 0.91,
      "metadata": {
        "campaign": "Spring 2026 Fitness",
        "ad_id": "ad_17",
        "platform": "youtube",
        "start_time": 12.0,
        "end_time": 19.5,
        "source_video_url": "s3://ad-archive/campaigns/spring-2026/fitness-spot-17.mp4",
        "thumbnail_url": "s3://mixpeek-storage/ns_casting/thumbnails/face_c91b.jpg"
      }
    }
  ],
  "total_results": 2,
  "execution_time_ms": 142
}
FieldTypeDescription
face_identity_extractor_v1_face_embeddingfloat[512]ArcFace face embedding for identity matching
face_bboxobjectBounding box of the detected face (normalized coordinates)
face_confidencenumberDetection confidence (0-1)
scorenumberCosine similarity to the query face (in retriever results)
A score above 0.85 typically indicates the same person across different videos. See the Build a Casting Agent quickstart for the full workflow including competitor cross-referencing.

Output Schema

After extraction, each video segment produces a document like this:
{
  "document_id": "doc_abc123",
  "start_time": 10.0,
  "end_time": 20.0,
  "transcription": "Welcome to today's lecture on machine learning fundamentals...",
  "description": "Instructor standing at whiteboard, introducing ML concepts",
  "ocr_text": "Machine Learning 101",
  "thumbnail_url": "s3://mixpeek-storage/ns_123/thumbnails/thumb_1.jpg",
  "source_video_url": "s3://my-bucket/videos/lecture-01.mp4",
  "video_segment_url": "s3://mixpeek-storage/ns_123/segments/seg_001.mp4",
  "metadata": {
    "title": "ML Fundamentals Lecture",
    "category": "education"
  },
  "multimodal_extractor_v1_multimodal_embedding": [0.023, -0.041, "...1408 floats"],
  "multimodal_extractor_v1_transcription_embedding": [0.018, -0.032, "...1024 floats"]
}
FieldTypeDescription
start_timenumberSegment start in seconds
end_timenumberSegment end in seconds
transcriptionstringWhisper-transcribed audio
descriptionstringGemini-generated scene description
ocr_textstringText visible in video frames
thumbnail_urlstringS3 URL of the segment thumbnail
source_video_urlstringOriginal source video URL
video_segment_urlstringURL of this specific segment clip
multimodal_extractor_v1_multimodal_embeddingfloat[1408]Vertex AI visual/multimodal embedding
multimodal_extractor_v1_transcription_embeddingfloat[1024]E5-Large transcript embedding