Mixpeek Logo
    Schedule Demo
    ESEthan Steininger
    5 min read

    Turning Frames into DataFrames: AI-Powered Video Analytics

    By applying the classic group_by pattern to structured video data at index time, you can turn raw frames into searchable, analyzable DataFrames aligned with how your users explore footage.

    Turning Frames into DataFrames: AI-Powered Video Analytics
    Implementation

    Suppose you want to run an analytical query on your basketball footage:

    "Show me every jump shot from each season's highest scoring player where the opposing team is winning by 3 or less"

    —you need a pipeline of modular, composable feature extractors that can split, aggregate, and merge across objects, actions, game context, and external stats.

    💡
    Mixpeek builds custom extractors for your workflow, we also have existing ones:

    Existing Feature Extractors:

    Ok, so how does it all work together?


    We Need to Create 3 Indexes

    To support these advanced, analytical and semantic search queries, we build three separate indexes:

    1. Objects
    2. Actions
    3. Game Context

    Each index is constructed using group_by operations over key entities. This allows us to precompute, aggregate, and store enriched data upfront.

    Note: splitting videos often requires zero or few-shot segmentation models.


    Object Index

    • Split: Detect all player objects across frames.
    • Group By: Player ID (using jersey OCR, pose, face recognition).
    • External Join: Pull player stats via URL/API lookup (https://stats.nba.com/players/{id}/season_stats).
    • Generate: Compute video clip embeddings for each player’s segments.
    • Merge: Attach stats to each clip embedding.
    # Step 1: Split video into segments with detected players
    clips = detect_objects_and_segment(video)  # returns list of {clip_id, player_id, timestamp, ...}
    
    # Step 2: Group clips by player_id
    grouped_by_player = group_by(clips, key="player_id")
    
    # Step 3: Fetch external stats per player
    def fetch_player_stats(player_id):
        url = f"https://stats.api.com/players/{player_id}/season_stats"
        return http_get(url)
    
    # Step 4: Attach stats to each clip
    enriched_clips = []
    for player_id, player_clips in grouped_by_player.items():
        stats = fetch_player_stats(player_id)
        for clip in player_clips:
            clip["stats"] = stats
            enriched_clips.append(clip)
    
    # Step 5: Save enriched clips to index
    index_store.save("object_index", enriched_clips)
    

    Action Index

    • Split: Classify action segments (e.g., jump shots, dunks, assists).
    • Group By: Player ID or action type.
    • Aggregate: Summarize frequency, duration, success rate (if available).
    • Merge: Tag clips for retrieval based on action + player pairings.
    # Step 1: Run action classifier to label clips
    labeled_clips = classify_actions(video)  # returns list of {clip_id, action_label, player_id, timestamp, ...}
    
    # Step 2: Filter for jump shots
    jump_shots = filter(lambda c: c["action_label"] == "jump_shot", labeled_clips)
    
    # Step 3: Group jump shots by player
    jump_shots_by_player = group_by(jump_shots, key="player_id")
    
    # Step 4: Enrich each jump shot with metadata (e.g., shot_clock, defender_distance)
    for player_id, clips in jump_shots_by_player.items():
        for clip in clips:
            metadata = extract_context_metadata(clip["clip_id"])
            clip["metadata"] = metadata
    
    # Step 5: Save to action index
    index_store.save("action_index", jump_shots)
    

    Game Context Index

    • Split: Extract time, score, shot clock, and period using scoreboard overlays or synced metadata.
    • Group By: Game ID or quarter.
    • Compute: Derive score_diff, time_remaining, and other contextual flags.
    • Merge: Add context to each clip (e.g., “opponent up by ≤ 3”).
    # Step 1: Extract scoreboard info from each clip
    clips_with_scores = extract_scoreboard(video)  # returns list of {clip_id, team_score, opponent_score, timestamp, ...}
    
    # Step 2: Compute score differential
    for clip in clips_with_scores:
        clip["score_diff"] = clip["opponent_score"] - clip["team_score"]
    
    # Step 3: Flag relevant clips (opponent winning by 3 or fewer points)
    flagged_clips = filter(lambda c: 0 < c["score_diff"] <= 3, clips_with_scores)
    
    # Step 4: Tag with flag for downstream retrieval
    for clip in flagged_clips:
        clip["flag"] = "close_game_opponent_leading"
    
    # Step 5: Save to game context index
    index_store.save("game_context_index", flagged_clips)
    

    Query-Time Result

    At retrieval, your retriever can now filter indexed clips with something like:

    SELECT *
    FROM jump_shot_clips
    WHERE player_id = season_top_scorer
      AND score_diff <= 3
    

    Supported Content-Based Queries by Input Type

    Input Type Example Query Index Used Feature Extractors Involved Query Mechanics
    Text “Jump shots by the top scorer when the team is losing by ≤3” Objects, Actions, Context Action classifier, Score diff calculator, Stats join Semantic query → match metadata and tags across precomputed indexes
    Image "Find plays where this player's pose matches this still image" Objects, Actions Pose estimation, Embedding similarity Image embedding → nearest neighbor search in object/action clip embeddings
    Video "Show me clips like this sequence" Actions, Game Context Temporal embedding, Sequence clustering Video embedding → similarity over sequence vectors
    Text + Image "Find all jump shots like this frame by LeBron James" Objects, Actions Object detection, Face ID, Action tagging Image filters object, text filters action → intersected at retrieval
    Text + Video "Give me all clutch plays like this highlight reel" All indexes Action + Context tagging, Embedding scoring Combines natural language and visual patterns → top-K matches across all indexes
    Image + Stats "Show me this player’s plays when his FG% is over 60%" Objects, Game Context Object detection, External stats join Image localizes player, external stats filter → combined pre-indexed data
    Multimodal (Text + Image + Context) "Where is this player shooting threes in the last minute of tied games?" All indexes Object, Action, Score/time context Multi-filter match across all indexed dimensions

    This makes it easy to support hybrid queries that combine natural language, visual similarity, and structured filters — all made possible because your footage is preprocessed into queryable, analytics-friendly data structures.

    Why Precompute This?

    Doing this at index time means:

    • You avoid re-scanning raw video every time someone queries
    • You can attach aggregated stats directly to search results
    • You bake in assumptions about how people want to explore the footage

    It’s like adding indexes and rollups to a database table — you make future queries faster by doing the hard part up front.


    TL;DR

    If your video is structured into object events, group_by is a useful pattern — not just for querying, but for indexing. Precomputing Split → Aggregate → Merge helps turn raw footage into something explorable, search-ready, and analytics-friendly.

    You’re not just parsing frames — you’re building DataFrames.


    Purpose-Built, for You

    We offer purpose-built extraction pipelines, called Feature Extractors for each access pattern.

    Feature Extractors are executed in parallel, so to chain them together you create a Collection with Extractors, and use that Collection as the source for a new one.

    Then Finally, you pair them with purpose built Retrievers for the ultimate Video Search and Analytics experience.

    Going Beyond

    At Mixpeek we introduce the concept of Taxonomies, which are flat or hierarchal collections that can be used as a join (materialized or computed). This enables you to enrich your new videos processed outputs with the overlap of another.

    We also introduce Clusters, which act as a group (not to be confused with pandas' operation) which allows you to cluster, and group.

    Join the Discussion

    Have thoughts, questions, or insights about this post? Be the first to start the conversation in our community!

    Start a Discussion
    ES
    Ethan Steininger

    April 6, 2025 · 5 min read