AI Video Tagging With Dynamic Taxonomies

Dynamic taxonomies enable automatic classification of video content at scale. Instead of manually tagging thousands of hours of footage, multimodal AI can identify scenes, moods, actions, and key moments across your video library.

Real-World Applications

Content Libraries

Scene-level categorization for episodic content
Identification of specific actions (fights, chases, emotional moments)
Automated content moderation
Mood-based classification for recommendation systems

News & Sports

Automatic distinction between studio/field footage
Action detection (goals, plays, celebrations)
Speaker/anchor identification
On-screen text extraction and classification

User-Generated Content

Brand moment detection
Inappropriate content flagging
Action/mood classification
Trending content identification

Implementation Guide

Define Your Taxonomy Structure

Create hierarchical classifications that match your content:

POST /entities/taxonomies
{
  "taxonomy_name": "content_classifier",
  "nodes": [
    {
      "name": "moods",
      "embedding_config": [
        {
          "embedding_model": "multimodal",
          "type": "text",
          "value": "Scene mood and emotional atmosphere analysis"
        }
      ],
      "children": [
        {
          "name": "high_energy",
          "embedding_config": [
            {
              "embedding_model": "multimodal",
              "type": "video",
              "value": "https://assets.example.com/reference/action_scene.mp4"
            },
            {
              "embedding_model": "text",
              "value": "Fast-paced, dynamic, intense action and movement"
            }
          ]
        },
        {
          "name": "emotional",
          "embedding_config": [
            {
              "embedding_model": "multimodal",
              "type": "video",
              "value": "https://assets.example.com/reference/dramatic_scene.mp4"
            },
            {
              "embedding_model": "text",
              "value": "Dramatic, emotional, intimate character moments"
            }
          ]
        }
      ]
    }
  ]
}

Set Up Processing Pipeline

Configure your namespace and collection:

POST /namespaces
{
  "namespace_name": "video_processing",
  "vector_indexes": ["multimodal", "text"],
  "payload_indexes": [
    {
      "field_name": "taxonomy.classifications",
      "type": "keyword",
      "field_schema": {
        "type": "keyword",
        "is_tenant": false
      }
    }
  ]
}

Process Videos

Ingest videos with intelligent sampling and taxonomy classification:

POST /ingest/videos/url
{
  "url": "https://content.example.com/videos/episode_123.mp4",
  "collection": "premium_content",
  "feature_extractors": {
    "interval_sec": 10,
    "embed": [
      {
        "type": "url",
        "vector_index": "multimodal"
      }
    ],
    "describe": {
      "enabled": true,
      "vector_index": "text"
    }
  },
  "taxonomy_config": {
    "taxonomy_ids": ["tax_abc123"],
    "confidence_threshold": 0.75,
    "min_segment_duration": 5
  }
}

Intelligent Sampling Settings

Choose sampling intervals based on content type:

Content Type	Interval (sec)	Rationale
Action/Sports	5-10	Capture rapid changes
Dialog Scenes	15-20	Focus on key moments
News/Interviews	20-30	Capture scene changes

💡

For a more intelligent sampling, consider dynamic scene splitting: https://blog.mixpeek.com/dynamic-video-chunking-scene-detection/

Key Optimizations

Reference Selection

Use high-quality, representative video clips for each category
Include multiple examples per taxonomy node
Update reference content as your library evolves

Confidence Thresholds

Start high (0.85+) for critical classifications
Lower (0.7+) for general categorization
Adjust based on validation results

Search Integration

Query classified content:

POST /features/search
{
  "collections": ["premium_content"],
  "queries": [
    {
      "vector_index": "multimodal",
      "type": "text",
      "value": "high energy action sequence"
    }
  ],
  "filters": {
    "AND": [
      {
        "key": "taxonomy.classifications.node_id",
        "operator": "in",
        "value": ["tax_node_high_energy"]
      }
    ]
  },
  "group_by": {
    "field": "asset_id",
    "max_features": 5
  }
}

Practical Tips

Start Small
- Begin with 2-3 main categories
- Validate classification accuracy
- Expand based on results
Optimize Processing
- Use appropriate sampling intervals
- Batch process similar content
- Monitor classification confidence
Maintain Quality
- Regularly update reference content
- Review edge cases
- Adjust thresholds based on needs

Common Challenges

Mixed Content

Solution: Use multiple reference examples
Example: News segments with both studio/field footage

Temporal Context

Solution: Adjust sampling intervals
Example: Sports highlights need denser sampling

Scale Issues

Solution: Batch processing with appropriate intervals
Example: Process episodic content in seasons

The power of dynamic taxonomies comes from combining intelligent sampling with multimodal understanding. By properly configuring your taxonomy structure and processing pipeline, you can automatically classify thousands of hours of content with high accuracy.

Additional Learning

Here's how some other leaders in the space are thinking about the same problem: