Mixpeek Logo
    Automated Content Classification

    Multimodal Content Taxonomies

    Automatically classify any content—video, image, audio, or text—into structured categories like product types, content topics, or custom hierarchies. The multimodal equivalent of a SQL JOIN: match by similarity, not just keys.

    What You'll Get

    90%

    Reduce Manual Tagging 90%

    Auto-classify across 4 modalities

    3x

    Improve Search Relevance 3x

    Category-based filtering and discovery

    4

    Classify 4 Modalities

    Video, image, audio, and text

    Best for: Classifying content into predefined categories

    Not for relationships (use Ontologies) or grouping (use Clusters)

    What Is a Multimodal Taxonomy?

    A multimodal taxonomy is an automated classification system that categorizes content across video, images, audio, and text into structured, hierarchical categories. Unlike traditional tagging that operates on a single content type, a multimodal taxonomy applies the same category structure to any content format—ensuring consistent metadata enrichment across your entire content library.

    Mixpeek taxonomies work by matching content features against a predefined set of category collections using embedding similarity. Each document is automatically enriched with category labels, confidence scores, and hierarchy paths. This process replaces manual content tagging with automated classification that scales to millions of assets.

    Taxonomies can be flat (single-level categories like product tags) or hierarchical (multi-tier structures like Sports > Basketball > NBA). They are distinct from ontologies (which model entity relationships) and clusters (which group similar content without predefined categories).

    How Automated Content Classification Works

    Taxonomies match your content against predefined category collections using feature similarity, then enrich each document with structured metadata. Classify video, images, audio, and text—no manual tagging required.

    From Untagged to Enriched: Automatic Classification

    Content matched against taxonomy categories by similarity, then enriched with structured metadata

    Video

    Product review

    no tags

    Image

    Sneaker photo

    no tags

    Your
    Taxonomy
    Video

    Product review

    Sports > Basketball

    Image

    Sneaker photo

    Style > Footwear

    Result:Every document enriched with categories, confidence scores, and hierarchy paths
    Create TaxonomyPOST
    POST /v1/taxonomies
    {
    "taxonomy_name": "product_categories",
    "config": {
    "taxonomy_type": "hierarchical",
    "retriever_id": "ret_e5_multilingual",
    "hierarchical_nodes": [
    {
    "collection_id": "col_categories_l1",
    "label": "Top Categories"
    },
    {
    "collection_id": "col_categories_l2",
    "parent_collection_id": "col_categories_l1",
    "label": "Subcategories"
    }
    ]
    }
    }
    Enrichment ResultResponse
    Try in Mixpeek →
    // Document automatically enriched:
    {
    "document_id": "doc_a1b2c3d4e5f6",
    "title": "Running Shoe Review",
    "category_l1": "Footwear",
    "category_l2": "Athletic",
    "category_score": 0.92,
    "category_path": ["Footwear", "Athletic"]
    }
    // Now queryable by category!

    Before and After: Automated Tagging vs. Manual Classification

    Replace manual content tagging with automated multimodal classification across video, images, and text

    ✗

    Without Taxonomies

    Manual product tagging. Inconsistent categories across 50k SKUs.

    40+ hours/week manual effort

    ✓

    With Taxonomies

    Product images auto-categorized. Consistent hierarchy across all modalities.

    category: "Footwear > Athletic > Running Shoes"

    ✗

    Without Taxonomies

    Search only by filename or manual tags. Content buried in folders.

    Keyword search only

    ✓

    With Taxonomies

    Query by category + modality. Find all "Engineering" content across video, image, and text.

    Filter: category = "Engineering" AND modality = "video"

    Ready to Try the Taxonomy API?

    Classify your multimodal content with pre-built or custom category structures. Start enriching in minutes.

    Taxonomy Enrichment Outcomes

    Real outcomes from automated content classification in your multimodal data pipeline

    Precise Content Targeting

    Surface the right content by category, not just keywords. Power search, recommendations, and filtering.

    Classification Output:

    "Running Shoe Review" (video)

    • category_l1: Footwear
    • category_l2: Athletic
    • score: 0.92
    • Filterable in search & recommendations

    Eliminate Manual Tagging

    Auto-classify millions of assets across video, image, audio, and text.

    Classification at Scale:

    50,000 product assets

    • Custom taxonomy applied
    • 4 modalities classified
    • 0 manual tags needed
    • 40+ hours/week saved

    Hierarchical Precision

    Go from broad to specific with multi-tier taxonomy paths and inherited properties.

    Hierarchy Path:

    Technology > Consumer Electronics > Smartphones

    • Filter at any tier level
    • Inherit parent properties
    • Score: 0.80 at deepest match

    Taxonomy Classification Use Cases

    See how organizations use multimodal taxonomies to classify and monetize their content at scale.

    AdTech & Programmatic

    Auto-classify publisher content to IAB 3.0 for precise ad targeting. Match ads to content categories across video, images, and articles.

    Taxonomy Classification:

    Article "Tesla Review" → Automotive > Auto Technology

    Video "EV Comparison" → Automotive > Electric Vehicles

    Image "Model 3 Photo" → Automotive > Auto Technology

    Query: Find all "Automotive" content for car brand ad placement

    E-commerce & Retail

    Automatically categorize product images, videos, and descriptions into your product taxonomy.

    Product Classification:

    Image (product photo) → Footwear > Athletic > Running

    Video (unboxing) → Footwear > Athletic > Running

    Text (description) → Footwear > Athletic > Running

    Query: Same taxonomy applied across photo, video, and text for consistent categorization

    Media & Publishing

    Tag your entire content library with consistent categories across all modalities. Power recommendations and discovery.

    Content Classification:

    Video (news clip) → News > Politics > Elections

    Audio (podcast) → News > Politics > Elections

    Article (text) → News > Politics > Elections

    Query: Find all "Elections" content across video, audio, and articles

    Enterprise Knowledge

    Organize internal documents, training videos, and knowledge base assets with consistent departmental taxonomies.

    Knowledge Classification:

    Video (training) → Engineering > DevOps > CI/CD

    Document (wiki) → Engineering > DevOps > CI/CD

    Slides (images) → Engineering > DevOps > CI/CD

    Query: Find all "CI/CD" content for onboarding engineers

    Flat vs. Hierarchical Taxonomies

    Choose the right structure for your classification needs

    Flat TaxonomySingle-level
    {
    "taxonomy_name": "product_tags",
    "config": {
    "taxonomy_type": "flat",
    "retriever_id": "ret_clip_v1",
    "input_mappings": [{
    "input_key": "image_vector",
    "source_type": "vector",
    "path": "features.clip"
    }],
    "source_collection": {
    "collection_id": "col_product_tags",
    "enrichment_fields": [
    { "field": "category", "mode": "replace" },
    { "field": "tags", "mode": "append" }
    ]
    }
    }
    }

    Single-level classification. Each document matched to one category with enrichment fields copied directly.

    Hierarchical TaxonomyMulti-level
    {
    "taxonomy_name": "content_categories",
    "config": {
    "taxonomy_type": "hierarchical",
    "retriever_id": "ret_e5_multilingual",
    "hierarchical_nodes": [
    { "collection_id": "col_tier1",
    "label": "Tier 1 (Top-level)" },
    { "collection_id": "col_tier2",
    "parent_collection_id": "col_tier1",
    "label": "Tier 2 (Subcategories)" },
    { "collection_id": "col_tier3",
    "parent_collection_id": "col_tier2",
    "label": "Tier 3 (Specific)" }
    ]
    }
    }

    Multi-level classification with property inheritance. Documents enriched at the deepest matching tier, inheriting parent categories.

    Taxonomies vs. Ontologies vs. Clusters

    Choose the right content organization approach for your use case

    Taxonomies

    Classify content into predefined categories. Enrich with structured metadata using established systems.

    e.g., product types, content topics, IAB 3.0

    Ontologies

    Model entity relationships. Traverse connections between people, brands, locations across modalities.

    e.g., "Player → Team → Sponsor"

    Clusters

    Automatically group similar content. Discover patterns without predefined structure.

    e.g., "Similar scenes"

    Use them together: Taxonomies classify, Ontologies connect, and Clusters group—making your multimodal data searchable and intelligent.

    Video, Image, Audio, and Text Classification

    Apply the same taxonomy to classify content across every modality

    🎥

    Video

    Classify video frames by visual content and transcript topics

    🖼️

    Images

    Categorize product photos, logos, scenes into taxonomy labels

    🎧

    Audio

    Tag audio and transcripts by topic, genre, and subject matter

    📄

    Documents

    Classify articles, PDFs, and text by content categories

    The Power: Any content type is classified through the same taxonomy. A product review video, its thumbnail image, and its transcript text all receive the same category: "Technology > Consumer Electronics" classification.

    "Taxonomies reduced our manual content tagging from 40 hours/week to zero. Classification accuracy exceeded 90% across all modalities."

    Media platform processing 2M+ assets monthly

    Frequently Asked Questions About Content Taxonomies

    What is a multimodal taxonomy?

    A multimodal taxonomy is a classification system that categorizes content across multiple modalities—video, images, audio, and text—into structured categories. Unlike traditional tagging which works on a single content type, multimodal taxonomies apply the same category hierarchy to any content format, enabling consistent metadata enrichment across your entire content library.

    What is the difference between flat and hierarchical taxonomies?

    Flat taxonomies assign content to a single level of categories (e.g., "Sports", "Technology", "Fashion"). Hierarchical taxonomies organize categories into parent-child tiers (e.g., "Sports > Basketball > NBA"), allowing classification at multiple levels of specificity with property inheritance from parent to child categories.

    How do taxonomies differ from ontologies and clusters?

    Taxonomies classify content into predefined categories (e.g., product types, content topics). Ontologies model entity relationships and enable multi-hop reasoning across connected entities. Clusters automatically group similar content without predefined structure. They can be used together: taxonomies classify, ontologies connect, and clusters group.

    Can taxonomies classify video, images, and audio content?

    Yes. Mixpeek taxonomies classify content across all four modalities—video, images, audio, and documents—using the same taxonomy structure. A product review video, its thumbnail image, and its transcript text all receive the same category classification, ensuring consistent metadata enrichment regardless of content format.

    Ready to Classify Your Content?

    Start enriching your multimodal content with structured taxonomy metadata. Use pre-built taxonomies or bring your own category structures.