Documentation Index
Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Mixpeek decomposes video into searchable segments and extracts faces, visual embeddings, transcripts, scene descriptions, and structured metadata. Each segment becomes a document with its own vector indexes, so you can search within video at the sub-clip level.
| Feature | Model | Dimensions | Extractor |
|---|
| Visual embeddings | Vertex AI multimodal | 1408D | multimodal_extractor |
| Audio transcript | Whisper | — | multimodal_extractor |
| Transcript embeddings | E5-Large | 1024D | multimodal_extractor |
| Scene descriptions | Gemini | — | multimodal_extractor |
| OCR (on-screen text) | Gemini | — | multimodal_extractor |
| Face embeddings | ArcFace (SCRFD detect) | 512D | face_identity_extractor |
| Learning units (lectures) | E5-Large + Jina Code + SigLIP | 1024D / 768D | course_content_extractor |
| Temporal segments | FFmpeg (time / scene / silence) | — | multimodal_extractor |
| Goal | Extractor | Why |
|---|
| General video search (visual + spoken content) | multimodal_extractor | Unified embedding space across video, image, and text |
| Face recognition / identity matching | face_identity_extractor | 512D ArcFace embeddings with 99.8% verification accuracy |
| Educational content (lectures, slides, code) | course_content_extractor | Atomic learning units with text, code, and visual embeddings |
Create a Collection for Video
This collection splits video into 10-second segments, transcribes audio, and generates visual + transcript embeddings.
curl -X POST https://api.mixpeek.com/v1/collections \
-H "Authorization: Bearer $MIXPEEK_API_KEY" \
-H "X-Namespace: $NAMESPACE" \
-H "Content-Type: application/json" \
-d '{
"collection_name": "video-library",
"source": { "type": "bucket", "bucket_id": "bkt_videos" },
"feature_extractor": {
"feature_extractor_name": "multimodal_extractor",
"version": "v1",
"input_mappings": {
"video": "payload.video_url"
},
"field_passthrough": [
{ "source_path": "metadata.title" },
{ "source_path": "metadata.category" }
],
"parameters": {
"split_method": "scene",
"scene_detection_threshold": 0.5,
"run_transcription": true,
"run_transcription_embedding": true,
"run_multimodal_embedding": true,
"run_video_description": true,
"enable_thumbnails": true
}
}
}'
Search by Visual Content
Create a retriever that searches video segments by visual similarity. A text query like “person writing on whiteboard” finds visually matching segments through Vertex AI’s cross-modal embedding space.
curl -X POST https://api.mixpeek.com/v1/retrievers \
-H "Authorization: Bearer $MIXPEEK_API_KEY" \
-H "X-Namespace: $NAMESPACE" \
-H "Content-Type: application/json" \
-d '{
"retriever_name": "video-visual-search",
"collection_ids": ["col_video_library"],
"input_schema": {
"properties": {
"query": { "type": "text", "required": true }
}
},
"stages": [
{
"stage_name": "visual_search",
"stage_type": "filter",
"config": {
"stage_id": "feature_search",
"parameters": {
"query": "{{INPUT.query}}",
"top_k": 20
}
}
}
]
}'
Execute the retriever:
curl -X POST https://api.mixpeek.com/v1/retrievers/ret_abc123/execute \
-H "Authorization: Bearer $MIXPEEK_API_KEY" \
-H "X-Namespace: $NAMESPACE" \
-H "Content-Type: application/json" \
-d '{
"inputs": { "query": "person writing on whiteboard" },
"limit": 10
}'
Search by Transcript
To search spoken content, create a retriever that targets the transcription embedding index.
curl -X POST https://api.mixpeek.com/v1/retrievers \
-H "Authorization: Bearer $MIXPEEK_API_KEY" \
-H "X-Namespace: $NAMESPACE" \
-H "Content-Type: application/json" \
-d '{
"retriever_name": "video-transcript-search",
"collection_ids": ["col_video_library"],
"input_schema": {
"properties": {
"query": { "type": "text", "required": true }
}
},
"stages": [
{
"stage_name": "transcript_search",
"stage_type": "filter",
"config": {
"stage_id": "feature_search",
"parameters": {
"feature_address": "mixpeek://multimodal_extractor@v1/transcription_embedding",
"input_mapping": { "text": "query" },
"query": "{{INPUT.query}}",
"top_k": 20
}
}
}
]
}'
Search by Face
Use a separate collection with face_identity_extractor to find video segments containing a specific person.
curl -X POST https://api.mixpeek.com/v1/collections \
-H "Authorization: Bearer $MIXPEEK_API_KEY" \
-H "X-Namespace: $NAMESPACE" \
-H "Content-Type: application/json" \
-d '{
"collection_name": "video-faces",
"source": { "type": "bucket", "bucket_id": "bkt_videos" },
"feature_extractor": {
"feature_extractor_name": "face_identity_extractor",
"version": "v1",
"input_mappings": {
"video": "payload.video_url"
},
"parameters": {
"detection_model": "scrfd_2.5g",
"embedding_model": "arcface_r100",
"video_sampling_fps": 1.0,
"video_deduplication": true,
"video_deduplication_threshold": 0.8
}
}
}'
Example: Casting Intelligence Across Ad Creatives
A performance marketing agency indexes hundreds of video ads with face_identity_extractor to track which talent appears in which campaigns. The retriever returns face matches with timestamps and confidence scores, enabling queries like “find every ad featuring this creator” or “has this person appeared in a competitor’s campaign?”
Face identity document (per detected face):
{
"document_id": "doc_face_8f2a",
"source_video_url": "s3://ad-archive/campaigns/summer-2026/outdoor-spot-04.mp4",
"start_time": 3.2,
"end_time": 8.7,
"thumbnail_url": "s3://mixpeek-storage/ns_casting/thumbnails/face_8f2a.jpg",
"metadata": {
"campaign": "Summer 2026 Outdoor",
"ad_id": "ad_04",
"platform": "meta"
},
"face_identity_extractor_v1_face_embedding": [0.112, -0.045, "...512 floats"],
"face_bbox": { "x": 0.32, "y": 0.15, "width": 0.18, "height": 0.24 },
"face_confidence": 0.97
}
Retriever execution result (querying by face similarity):
{
"results": [
{
"document_id": "doc_face_8f2a",
"score": 0.94,
"metadata": {
"campaign": "Summer 2026 Outdoor",
"ad_id": "ad_04",
"platform": "meta",
"start_time": 3.2,
"end_time": 8.7,
"source_video_url": "s3://ad-archive/campaigns/summer-2026/outdoor-spot-04.mp4",
"thumbnail_url": "s3://mixpeek-storage/ns_casting/thumbnails/face_8f2a.jpg"
}
},
{
"document_id": "doc_face_c91b",
"score": 0.91,
"metadata": {
"campaign": "Spring 2026 Fitness",
"ad_id": "ad_17",
"platform": "youtube",
"start_time": 12.0,
"end_time": 19.5,
"source_video_url": "s3://ad-archive/campaigns/spring-2026/fitness-spot-17.mp4",
"thumbnail_url": "s3://mixpeek-storage/ns_casting/thumbnails/face_c91b.jpg"
}
}
],
"total_results": 2,
"execution_time_ms": 142
}
| Field | Type | Description |
|---|
face_identity_extractor_v1_face_embedding | float[512] | ArcFace face embedding for identity matching |
face_bbox | object | Bounding box of the detected face (normalized coordinates) |
face_confidence | number | Detection confidence (0-1) |
score | number | Cosine similarity to the query face (in retriever results) |
A score above 0.85 typically indicates the same person across different videos. See the Build a Casting Agent quickstart for the full workflow including competitor cross-referencing.
Output Schema
After extraction, each video segment produces a document like this:
{
"document_id": "doc_abc123",
"start_time": 10.0,
"end_time": 20.0,
"transcription": "Welcome to today's lecture on machine learning fundamentals...",
"description": "Instructor standing at whiteboard, introducing ML concepts",
"ocr_text": "Machine Learning 101",
"thumbnail_url": "s3://mixpeek-storage/ns_123/thumbnails/thumb_1.jpg",
"source_video_url": "s3://my-bucket/videos/lecture-01.mp4",
"video_segment_url": "s3://mixpeek-storage/ns_123/segments/seg_001.mp4",
"metadata": {
"title": "ML Fundamentals Lecture",
"category": "education"
},
"multimodal_extractor_v1_multimodal_embedding": [0.023, -0.041, "...1408 floats"],
"multimodal_extractor_v1_transcription_embedding": [0.018, -0.032, "...1024 floats"]
}
| Field | Type | Description |
|---|
start_time | number | Segment start in seconds |
end_time | number | Segment end in seconds |
transcription | string | Whisper-transcribed audio |
description | string | Gemini-generated scene description |
ocr_text | string | Text visible in video frames |
thumbnail_url | string | S3 URL of the segment thumbnail |
source_video_url | string | Original source video URL |
video_segment_url | string | URL of this specific segment clip |
multimodal_extractor_v1_multimodal_embedding | float[1408] | Vertex AI visual/multimodal embedding |
multimodal_extractor_v1_transcription_embedding | float[1024] | E5-Large transcript embedding |