Video Scene Search

Find specific scenes within videos using natural language descriptions. The pipeline detects scene boundaries, generates embeddings for each scene, and enables precise timestamp-level search across an entire video library. Query for visual content, actions, or spoken dialogue.

video

image

text

Multi-Tier

3.2K runs

Run in Builder

from mixpeek import Mixpeek

client = Mixpeek(api_key="YOUR_API_KEY")

# Create video collection with scene decomposition
collection = client.collections.create(
    namespace_id="ns_your_namespace",
    name="video_scenes",
    extractors=["multimodal-extractor", "text-extractor"],
    params={"video_chunking": "scene-based"}
)

# Upload videos
client.buckets.upload(bucket_id="bkt_videos", url="s3://your-bucket/videos/")

# Search for a specific scene
results = client.retrievers.execute(
    retriever_id="ret_scene_search",
    query={"text": "person presenting a chart on a whiteboard"}
)

for doc in results["results"]:
    print(f"Video: {doc['root_object_id']}")
    print(f"  Scene: {doc['start_time']:.1f}s - {doc['end_time']:.1f}s")
    print(f"  Score: {doc['score']:.3f}")
    if doc.get("text"):
        print(f"  Transcript: {doc['text'][:80]}...")

Feature Extractors

Retriever Stages

rerank

Rerank documents using cross-encoder models for accurate relevance

sort

Video Scene Search

Feature Extractors

Retriever Stages

Related Recipes & Resources

Hierarchical Classification

Brand Safety & Ad Verification Pipeline

Automated Video Tagging

Searchable Video Library

Semantic Multimodal Search

Feature Extraction