Mixpeek Logo
    9 min read

    Reverse Video Search: How It Works + Python API Tutorial

    Reverse video search allows us to use a video clip as an input for a query against videos that have been indexed in a vector store.

    Reverse Video Search: How It Works + Python API Tutorial
    Multimodal Search

    You may have used some kind of reverse image search before. Put simply, instead of searching using text: australian shepherds running, you can use an image: aussie_running.png. The search engine will then find all similar images based on that input.

    Advanced Technical Deep Dive: mixpeek.com/tutorials/reverse-video-search

    Feature Extractors Used:

    But have you used reverse video search? The approach is the same: use your video as a query to find other videos.

    Reverse video search enables users to find similar videos by using a video clip as the search input, rather than traditional text-based queries. This technology leverages advanced computer vision and machine learning to analyze and match visual content across video databases.

    Component Description
    Feature Extraction Processing videos to identify and encode visual elements, scenes, and patterns
    Vector Embeddings Converting visual features into numerical representations for efficient comparison
    Similarity Matching Algorithms that compare video embeddings to find similar content
    Temporal Analysis Processing that considers the sequential nature of video content
    See a list of feature extractors: https://mixpeek.com/extractors

    Understanding Through Image Search First

    Before diving into video search, it's helpful to understand reverse image search, which follows similar principles but with still images.

    How Reverse Image Search Works

    1. Input Processing: The system takes an image as input
    2. Feature Extraction: Analyzes visual elements like colors, shapes, and patterns
    3. Similarity Matching: Compares these features against a database of images
    4. Result Ranking: Returns similar images ranked by relevance
    Input Processing (Image as input) Feature Extraction (Colors, shapes, patterns) Similarity Matching (Compare features) Result Ranking (Rank by relevance)

    Try it on Google Images: https://images.google.com/

    In the example below, I'll upload a picture of an Australian Shepherd dog, and Google's reverse image search will find all similar pictures of Australian Shepherds.

    Demonstration of reverse image search using Google Images - uploading a photo of an Australian Shepherd dog to find similar images
    Use Case Description Business Impact
    E-Commerce Finding similar products from product images Increased sales through visual discovery
    Content Verification Identifying original sources of images Enhanced content authenticity
    Brand Protection Detecting unauthorized use of logos/images Better intellectual property protection
    Real Estate Finding similar properties from photographs Improved property matching

    Image Feature Extraction

    To perform a search we need to extract features from the image. Below, we're just leaving the default options, but you can go crazy with how many features you can pull out

    import requests
    
    url = "https://api.mixpeek.com/ingest/images/url"
    
    payload = {
        "url": "https://www.akc.org/wp-content/uploads/2017/11/Australian-Shepherd.1.jpg",
        "collection": "sample_dogs",
        "feature_extractors": {
            "embed": [
                {
                    "type": "url",
                    "embedding_model": "image"
                }
            ]
        }
    }
    headers = {  
      'Authorization': 'Bearer API_KEY', # removed after for brevity
      "Content-Type": "application/json"
    }
    
    response = requests.request("POST", url, json=payload, headers=headers)
    
    print(response.text)
    Features - Mixpeek
    Understanding features and feature extraction in Mixpeek
    import requests
    
    url = "https://api.mixpeek.com/features/search"
    
    payload = {
        "queries": [
            {
                "type": "url",
                "value": "https://www.akc.org/wp-content/uploads/2017/11/Australian-Shepherd.1.jpg",
                "embedding_model": "image"
            },
        ],
        "collections": ["sample_dogs"]
    }
    Retrievers - Mixpeek
    Configure and use retrieval pipelines for powerful multimodal search

    Reverse video search works the same way. We first embed a couple videos, then provide a sample video as a search.

    For our index, we'll use a movie trailer from the 1940s classic, The Third Man:

    Prepare the video(s)

    We'll split the video up by 5 second intervals, then embed each interval using the multimodal embedding model. We'll also pull out a description from each interval.

    import requests
    import json
    
    url = "https://api.mixpeek.com/ingest/videos/url"
    
    payload = json.dumps({
      "url": "https://mixpeek-public-demo.s3.us-east-2.amazonaws.com/media-analysis/The+Third+Man++Official+Trailer.mp4",
      "collection": "my_video_collection",
      "feature_extractors": [
        {
          "interval_sec": 5,
          "describe": {
            "enabled": True
          },
          "embed": [
            {
              "type": "url",
              "embedding_model": "multimodal"
            }
          ]
        }
      ]
    })
    
    response = requests.request("POST", url, headers=headers, data=payload)
    
    print(response.text)
    
    Buckets - Mixpeek
    Store and organize raw multimodal data objects for processing

    Embed the video to search and run!

    Now we have a grainy video clip from some CCTV that we'll use for our reverse video search:

    We'll do the same thing, only difference is we'll want the embedding from the video we want to search across the already indexed and embedded videos:

    import requests
    
    url = "https://api.mixpeek.com/features/search"
    
    payload = {
        "queries": [
            {
                "type": "url",
                "value": "https://mixpeek-public-demo.s3.us-east-2.amazonaws.com/media-analysis/video_queries/exiting_sewer.mp4",
                "embedding_model": "multimodal",
            },
        ],
        "collections": ["my_video_collection"],
    }
    
    response = requests.request("POST", url, json=payload, headers=headers)
    
    print(response.text)
    

    Compare results

    Now that we have our embeddings we can run a KNN search:

    This will return an array of objects that we can use to render in our application indicating what the most similar video timestamps are based on the video embedding as a query

    results = [
        {"start_time": 25.0, "end_time": 30.0, "score": 0.6265061},
        {"start_time": 5.0, "end_time": 10.0, "score": 0.6025797},
        {"start_time": 30.0, "end_time": 35.0, "score": 0.59880114},
    ]

    Now if we look at the original video @ 25 seconds in:

    Amazing, we found a challenging scene to describe using a video query as an input. Now imagine doing that across billions of videos 🤯

    Using this template, we set it so that whenever a new object is added to our S3 bucket it's automatically processed and inserted into our database (connection established prior). Additionally, if a video is ever deleted from our S3 bucket its' embeddings are deleted from our database as well.

    Applications and Use Cases

    Industry Use Case Benefits
    Content Creation Finding specific scenes or clips Streamlined editing process
    Media Monitoring Tracking content reuse across platforms Better copyright enforcement
    Security Analyzing surveillance footage Enhanced threat detection
    E-commerce Product discovery through video Improved shopping experience

    Architecture: How Reverse Video Search Works Under the Hood

    A production reverse video search system has three main stages: ingestion, indexing, and retrieval. Here's how Mixpeek's pipeline handles each:

    Stage 1: Video Ingestion & Segmentation

    Raw video files are uploaded to object storage (S3, GCS, or MinIO). The system automatically splits each video into segments using configurable strategies:

    • Fixed interval: Split every N seconds (e.g., 5s chunks)
    • Scene detection: Split on visual scene changes using the Scene Splitting extractor
    • Shot boundary: Split on camera cuts and transitions

    Each segment is treated as an independent document with its own metadata (start_time, end_time, source video reference).

    Stage 2: Feature Extraction & Embedding

    Each video segment passes through one or more feature extractors to produce vector embeddings. These embeddings capture the visual, temporal, and optionally audio content of each clip in a high-dimensional vector space.

    The system supports multiple embedding strategies simultaneously:

    Strategy Best For Dimensions
    Multimodal (video + text) General video similarity, cross-modal search 512-1024
    Visual-only (frame embeddings) Scene matching, duplicate detection 768
    Temporal (motion patterns) Action recognition, surveillance 512

    Stage 3: Vector Search & Retrieval

    When a query video is submitted, it goes through the same segmentation and embedding pipeline. The resulting vectors are compared against the indexed collection using approximate nearest neighbor (ANN) search. Results are ranked by cosine similarity and returned with timestamps, scores, and metadata.

    This three-stage pipeline runs as a fully managed service—you don't need to manage GPU infrastructure, vector databases, or model serving. The full architecture handles billions of video segments with sub-200ms query latency.

    Supported Embedding Models

    Mixpeek's reverse video search is model-agnostic. You can choose the embedding model that fits your use case:

    Model Type Best For
    Multimodal (default) Video + Text joint embedding Cross-modal search (text-to-video, video-to-video)
    CLIP-based Frame-level visual embedding Visual similarity, product matching
    Video-native Temporal-aware embedding Action recognition, motion matching
    Custom / BYO Your own model via API Domain-specific use cases (medical, satellite, etc.)

    All models are served via Mixpeek's inference engine with automatic batching and GPU acceleration. See the full list of available feature extractors.

    Performance at Scale

    Reverse video search needs to be fast to be useful. Here's what Mixpeek achieves in production:

    Metric Value Notes
    Query latency (p50) < 100ms After embedding, vector search only
    Query latency (p99) < 250ms Including embedding generation
    Index capacity Billions of segments Distributed vector index
    Ingestion throughput 1000+ videos/min Parallel processing with auto-scaling
    Recall@10 > 95% On standard video retrieval benchmarks

    The system scales horizontally—adding more videos doesn't degrade query performance because the vector index is distributed across nodes.

    Reverse Video Search: Mixpeek vs. Alternatives

    How does Mixpeek compare to other approaches for building reverse video search?

    Feature Mixpeek Google Video Intelligence Custom CLIP Pipeline Twelve Labs
    Video-to-video search ✅ Native ❌ Labels only ⚠️ Frame-level only ✅ Supported
    Temporal awareness ✅ Scene + motion ⚠️ Shot detection ❌ No ✅ Yes
    Self-hosted option ✅ Yes ❌ Cloud only ✅ Yes ❌ Cloud only
    Custom models (BYO) ✅ Any model ❌ No ✅ Full control ❌ No
    Managed infrastructure ✅ Fully managed ✅ Fully managed ❌ You manage ✅ Fully managed
    Multimodal (text + video) ✅ Native ⚠️ Separate APIs ✅ CLIP supports ✅ Native
    Sub-200ms queries at scale ✅ Yes N/A ⚠️ Depends on infra ✅ Yes

    Complete Example: Video File to Search Results in 20 Lines

    Here's a complete, copy-paste example that goes from "I have a video" to "here are the most similar clips":

    import requests
    
    API_KEY = "your_api_key"  # Get one free at mixpeek.com/start
    BASE_URL = "https://api.mixpeek.com"
    HEADERS = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}
    
    # Step 1: Index a video (splits into segments and embeds automatically)
    index_response = requests.post(f"{BASE_URL}/ingest/videos/url", json={
        "url": "https://your-bucket.s3.amazonaws.com/my-video.mp4",
        "collection": "my_videos",
        "feature_extractors": [{
            "interval_sec": 5,
            "embed": [{"type": "url", "embedding_model": "multimodal"}],
            "describe": {"enabled": True}
        }]
    }, headers=HEADERS)
    
    print(f"Indexed: {index_response.json()}")
    
    # Step 2: Search using another video as query
    search_response = requests.post(f"{BASE_URL}/features/search", json={
        "queries": [{
            "type": "url",
            "value": "https://your-bucket.s3.amazonaws.com/query-clip.mp4",
            "embedding_model": "multimodal"
        }],
        "collections": ["my_videos"]
    }, headers=HEADERS)
    
    # Step 3: Get results with timestamps and similarity scores
    results = search_response.json()
    for match in results[:5]:
        print(f"Score: {match['score']:.3f} | {match['start_time']}s - {match['end_time']}s")
    

    That's it. The API handles video segmentation, embedding generation, and vector search behind the scenes. Get a free API key to try it.

    💡
    Pro tip: For production use, set up a collection pipeline that automatically processes new videos as they're uploaded to your S3 bucket. No manual API calls needed after initial setup.

    Additional Resources

    For additional information and implementation details, refer to:


    Frequently Asked Questions

    What is reverse video search?

    Reverse video search is a way to find where a specific video clip appears online or identify similar videos. Instead of searching with text, you use a video as your input query.

    Can I do reverse video search on Google?

    Google doesn't currently support full reverse video search. It supports reverse image search, but not full-frame temporal video queries like Mixpeek offers.

    How does reverse video search work?

    It works by splitting a video into segments, extracting visual and temporal features from each part, and converting them into embeddings. These embeddings are then matched against a database to find visually similar clips.

    What are some real-world use cases?

    • Copyright enforcement: Detect unauthorized reuploads of your videos.
    • Content discovery: Quickly find related media assets for editing or repurposing.
    • Security: Search surveillance footage to track similar incidents.
    • Adtech: Find where branded video assets have appeared online.

    Is reverse video search free?

    Mixpeek offers a free tier with limited usage and a paid plan for advanced capabilities and high-volume searches. See pricing for details.

    Can I search YouTube videos with this?

    If you have a downloaded clip, you can use Mixpeek to search it against your indexed YouTube archive or library—assuming you’ve ingested that content into your collection.

    Reverse image search only works on static images. Reverse video search considers motion, audio, scene changes, and sequence—making it much more powerful for identifying exact moments or matches.