Mixpeek Logo
    Schedule Demo
    ESEthan Steininger
    โ€ขโ€ข5 min read

    Reverse Video Search

    Reverse video search allows us to use a video clip as an input for a query against videos that have been indexed in a vector store.

    Reverse Video Search
    Multimodal Search

    You may have used some kind of reverse image search before. Put simply, instead of searching using text: australian shepherds running, you can use an image: aussie_running.png. The search engine will then find all similar images based on that input.

    But have you used reverse video search? The approach is the same: use your video as a query to find other videos.

    Reverse video search enables users to find similar videos by using a video clip as the search input, rather than traditional text-based queries. This technology leverages advanced computer vision and machine learning to analyze and match visual content across video databases.

    Component Description
    Feature Extraction Processing videos to identify and encode visual elements, scenes, and patterns
    Vector Embeddings Converting visual features into numerical representations for efficient comparison
    Similarity Matching Algorithms that compare video embeddings to find similar content
    Temporal Analysis Processing that considers the sequential nature of video content
    See a list of feature extractors: https://mixpeek.com/extractors

    Understanding Through Image Search First

    Before diving into video search, it's helpful to understand reverse image search, which follows similar principles but with still images.

    How Reverse Image Search Works

    1. Input Processing: The system takes an image as input
    2. Feature Extraction: Analyzes visual elements like colors, shapes, and patterns
    3. Similarity Matching: Compares these features against a database of images
    4. Result Ranking: Returns similar images ranked by relevance
    Input Processing (Image as input) Feature Extraction (Colors, shapes, patterns) Similarity Matching (Compare features) Result Ranking (Rank by relevance)

    Try it on Google Images: https://images.google.com/

    In the example below, I'll upload a picture of an Australian Shepherd dog, and Google's reverse image search will find all similar pictures of Australian Shepherds.

    Use Case Description Business Impact
    E-Commerce Finding similar products from product images Increased sales through visual discovery
    Content Verification Identifying original sources of images Enhanced content authenticity
    Brand Protection Detecting unauthorized use of logos/images Better intellectual property protection
    Real Estate Finding similar properties from photographs Improved property matching

    Image Feature Extraction

    To perform a search we need to extract features from the image. Below, we're just leaving the default options, but you can go crazy with how many features you can pull out

    import requests
    
    url = "https://api.mixpeek.com/ingest/images/url"
    
    payload = {
        "url": "https://www.akc.org/wp-content/uploads/2017/11/Australian-Shepherd.1.jpg",
        "collection": "sample_dogs",
        "feature_extractors": {
            "embed": [
                {
                    "type": "url",
                    "embedding_model": "image"
                }
            ]
        }
    }
    headers = {  
      'Authorization': 'Bearer API_KEY', # removed after for brevity
      "Content-Type": "application/json"
    }
    
    response = requests.request("POST", url, json=payload, headers=headers)
    
    print(response.text)
    Features - Mixpeek
    Understanding features and feature extraction in Mixpeek
    import requests
    
    url = "https://api.mixpeek.com/features/search"
    
    payload = {
        "queries": [
            {
                "type": "url",
                "value": "https://www.akc.org/wp-content/uploads/2017/11/Australian-Shepherd.1.jpg",
                "embedding_model": "image"
            },
        ],
        "collections": ["sample_dogs"]
    }
    Retrievers - Mixpeek
    Configure and use retrieval pipelines for powerful multimodal search

    Reverse video search works the same way. We first embed a couple videos, then provide a sample video as a search.

    For our index, we'll use a movie trailer from the 1940s classic, The Third Man:

    Prepare the video(s)

    We'll split the video up by 5 second intervals, then embed each interval using the multimodal embedding model. We'll also pull out a description from each interval.

    import requests
    import json
    
    url = "https://api.mixpeek.com/ingest/videos/url"
    
    payload = json.dumps({
      "url": "https://mixpeek-public-demo.s3.us-east-2.amazonaws.com/media-analysis/The+Third+Man++Official+Trailer.mp4",
      "collection": "my_video_collection",
      "feature_extractors": [
        {
          "interval_sec": 5,
          "describe": {
            "enabled": True
          },
          "embed": [
            {
              "type": "url",
              "embedding_model": "multimodal"
            }
          ]
        }
      ]
    })
    
    response = requests.request("POST", url, headers=headers, data=payload)
    
    print(response.text)
    
    Buckets - Mixpeek
    Store and organize raw multimodal data objects for processing

    Embed the video to search and run!

    Now we have a grainy video clip from some CCTV that we'll use for our reverse video search:

    We'll do the same thing, only difference is we'll want the embedding from the video we want to search across the already indexed and embedded videos:

    import requests
    
    url = "https://api.mixpeek.com/features/search"
    
    payload = {
        "queries": [
            {
                "type": "url",
                "value": "https://mixpeek-public-demo.s3.us-east-2.amazonaws.com/media-analysis/video_queries/exiting_sewer.mp4",
                "embedding_model": "multimodal",
            },
        ],
        "collections": ["my_video_collection"],
    }
    
    response = requests.request("POST", url, json=payload, headers=headers)
    
    print(response.text)
    

    Compare results

    Now that we have our embeddings we can run a KNN search:

    This will return an array of objects that we can use to render in our application indicating what the most similar video timestamps are based on the video embedding as a query

    results = [
        {"start_time": 25.0, "end_time": 30.0, "score": 0.6265061},
        {"start_time": 5.0, "end_time": 10.0, "score": 0.6025797},
        {"start_time": 30.0, "end_time": 35.0, "score": 0.59880114},
    ]

    Now if we look at the original video @ 25 seconds in:

    Amazing, we found a challenging scene to describe using a video query as an input. Now imagine doing that across billions of videos ๐Ÿคฏ

    Using this template, we set it so that whenever a new object is added to our S3 bucket it's automatically processed and inserted into our database (connection established prior). Additionally, if a video is ever deleted from our S3 bucket its' embeddings are deleted from our database as well.

    Applications and Use Cases

    Industry Use Case Benefits
    Content Creation Finding specific scenes or clips Streamlined editing process
    Media Monitoring Tracking content reuse across platforms Better copyright enforcement
    Security Analyzing surveillance footage Enhanced threat detection
    E-commerce Product discovery through video Improved shopping experience

    Additional Resources

    For additional information and implementation details, refer to:

    ES
    Ethan Steininger

    June 6, 2024 ยท 5 min read