Mixpeek Logo
    9 min read

    IAB Contextual Classifier: Taxonomies for Videos and Images

    Classify text, images, and video into 700+ IAB Content Taxonomy categories using multimodal AI. Learn how it works under the hood and how to extend it for your contextual targeting needs.

    IAB Contextual Classifier: Taxonomies for Videos and Images
    Industry

    Contextual advertising is having its moment. With third-party cookies disappearing and privacy regulations tightening, contextual targeting — classifying page and creative content into standardized categories — is the industry's answer to precise ad placement without tracking users.

    But building a production-grade IAB content taxonomy classifier is harder than it looks. You need to handle 700+ categories across four hierarchical tiers, support text and visual content, and return results fast enough for real-time header bidding (under 2 seconds).

    That's why we built the Mixpeek IAB Contextual Classifier — a free, multimodal classifier that maps any content (text, image, or video) to the IAB Content Taxonomy 3.1. It's published as a public retriever on the Mixpeek marketplace, and you can start using it right now with zero setup.

    In this post, we'll break down:

    • How the classifier works under the hood
    • Why multimodal classification beats text-only approaches
    • How to extend and customize it for your own use cases
    • How it integrates with Prebid.js for real-time header bidding

    What Is the IAB Content Taxonomy?

    The IAB Content Taxonomy is the advertising industry's standard for categorizing digital content. Version 3.1 defines 700+ categories organized in a four-tier hierarchy:

    • Tier 1: 26 top-level categories (e.g., Sports, Technology & Computing, Arts & Entertainment)
    • Tier 2: ~366 subcategories (e.g., Sports → Basketball → NBA)
    • Tier 3–4: Granular sub-subcategories for fine-grained targeting
    IAB Content Taxonomy 3.1 — Hierarchy 26 top-level categories · 700+ total · 4 tiers deep TIER 1 TIER 2 TIER 3 TIER 4 Sports IAB17 Technology & Computing Automotive ··· Basketball IAB17-3 Football Soccer Artificial Intelligence Cloud Computing Electric Vehicles ··· Pro Basketball 84.9 College Basketball NFL Machine Learning EV Charging ··· NBA Playoffs March Madness ··· 26 Tier 1 Categories ~366 Tier 2 Subcategories 700+ Total Categories 4 Tier Depth

    Publishers and SSPs use these categories to match advertiser campaigns with relevant content. For example, Nike might target IAB17-3 (Basketball) while a financial services company targets IAB13 (Personal Finance).

    The problem? Most classification tools are either text-only (they can't classify a video ad or product image) or keyword-based (they match exact strings instead of understanding meaning). Let's look at how our approach solves both.


    How It Works Under the Hood

    The Mixpeek IAB Contextual Classifier uses a vector similarity search architecture rather than traditional rule-based or keyword matching. Here's the pipeline:

    STEP 1 STEP 2 STEP 3 STEP 4 OUTPUT Input Any content type text/article image/jpeg video/mp4 Multimodal Embed Vertex AI Model → 1408-dim vector → shared space text ≈ image ≈ video Content Search Embed file → nearest IAB category vectors top_k: 20 Text Search Embed query → nearest IAB category vectors top_k: 20 RRF Fusion Merge rankings score = Σ 1/(k+rank) combine both result sets final_top_k: 10 IAB Categories Ranked results Pro Basketball 0.849 Basketball 0.838 Sports 0.791 337ms text · ~500ms image Mixpeek IAB Contextual Classifier · Multimodal Embedding + Reciprocal Rank Fusion · IAB Content Taxonomy v3.1

    Step 1: Embed the IAB Taxonomy

    We pre-compute multimodal embeddings (via Google Vertex AI's multimodal embedding model) for every one of the 700+ IAB categories. Each category's descriptive text — including its name, parent path, and representative keywords — gets encoded into a 1408-dimensional vector.

    These vectors are stored in a Qdrant vector index, essentially creating a semantic map of the entire IAB taxonomy in embedding space.

    # Simplified: how each IAB category becomes a vector
    {
        "text": "Sports > Basketball > NBA",
        "metadata": {
            "iab_category_id": "IAB17-26",
            "iab_category_name": "Pro Basketball",
            "iab_tier": 2,
            "iab_path": ["Sports", "Basketball", "Pro Basketball"]
        }
    }
    # → Embedded via Vertex multimodal model → 1408D vector
    # → Stored in Qdrant for nearest-neighbor search
    

    Step 2: Embed the Input Content

    When you submit content for classification — whether it's a text snippet, an image, or a video — the same Vertex multimodal model encodes it into the same 1408-dimensional space.

    This is where multimodal comes in: a basketball game photo, a text article about the NBA Finals, and a highlight reel video all land in roughly the same region of embedding space. They're semantically close to each other and to the IAB "Pro Basketball" category vector, even though they're completely different media types.

    The classifier runs two parallel searches and fuses the results:

    1. Content search: Embeds the uploaded file (image/video) and finds the 20 nearest IAB category vectors
    2. Text search: Embeds the text query and finds the 20 nearest IAB category vectors

    These two result sets are merged using Reciprocal Rank Fusion (RRF), which combines ranking positions from both searches to produce a final top-10 list. RRF is particularly effective here because it surfaces categories that rank highly in both modalities, boosting confidence.

    // Retriever stage configuration
    {
        "stage_name": "multimodal_search",
        "config": {
            "stage_id": "feature_search",
            "parameters": {
                "searches": [
                    {
                        "feature_uri": "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding",
                        "query": { "input_mode": "content", "value": "{{INPUT.query_content}}" },
                        "top_k": 20
                    },
                    {
                        "feature_uri": "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding",
                        "query": { "input_mode": "text", "value": "{{INPUT.query}}" },
                        "top_k": 20
                    }
                ],
                "final_top_k": 10,
                "fusion": "rrf"
            }
        }
    }
    

    Step 4: Return Structured Results

    Each result includes the full IAB taxonomy metadata plus a confidence score:

    {
        "iab_category_name": "Pro Basketball",
        "iab_category_id": "IAB17-26",
        "iab_tier": 2,
        "iab_path": ["Sports", "Basketball", "Pro Basketball"],
        "iab_parent_id": "IAB17-3",
        "score": 0.8490
    }
    

    Why Multimodal Beats Text-Only Classification

    Most contextual classification APIs — including Google Cloud NLP, AWS Comprehend, and Klazify — are text-only. They can classify an article, but what about:

    • A product image on an e-commerce page?
    • A pre-roll video ad served before YouTube content?
    • A thumbnail that tells you more about the content than the page title?
    • A social media post that's an image with no caption?
    Text-Only vs. Multimodal Classification Why single-modality classifiers fall short in modern ad tech VS Text-Only Classifier LIMITED NLP / keyword-based Article with text "Lakers vs Celtics NBA Finals recap" Pass Product image Basketball shoe on e-commerce page Fail Video content Pre-roll ad or YouTube highlight reel Fail Mixed media page Article + images + embedded video Partial Non-English visual content Japanese manga, Arabic infographics Fail 1 of 5 scenarios fully supported Multimodal Classifier FULL COVERAGE Mixpeek Article with text → IAB17-26: Pro Basketball (84.9%) Pass Product image → IAB17-3: Basketball (81.2%) Pass Video content → IAB17-26: Pro Basketball (79.5%) Pass Mixed media page → All modalities fused via RRF Pass Non-English visual content → Visual understanding is language-agnostic Pass 5 of 5 scenarios fully supported

    The Mixpeek classifier handles all of these because the Vertex multimodal embedding model encodes text, images, and video into a shared semantic space. A photo of a basketball game, the text "Lakers vs. Celtics NBA Finals", and a highlight clip all map to vectors near IAB17-26: Pro Basketball.

    This matters for real-world ad tech because:

    ScenarioText-OnlyMultimodal (Mixpeek)
    Article with text✅ Works✅ Works
    Image-heavy page (Pinterest, Instagram)❌ No text to classify✅ Classifies images directly
    Video content (YouTube, TikTok)❌ Requires transcription first✅ Classifies video frames directly
    Mixed media (article + images + video)⚠️ Partial (text only)✅ All modalities contribute
    Non-English visual content❌ Text extraction unreliable✅ Visual understanding is language-agnostic

    How to Extend the Classifier for Your Use Case

    The published classifier at mxp.co/r/iab-contextual-classifier works out of the box — but the real power comes when you use it as a starting point and customize it. Here are three ways to extend it.

    1. Add Custom Categories

    The IAB taxonomy covers most ad tech needs, but you might have industry-specific categories. Since the classifier is backed by a Mixpeek collection (a processing pipeline tied to a vector index), you can add your own category documents to the same bucket:

    import requests
    
    API_KEY = "your-api-key"
    NAMESPACE_ID = "your-namespace"
    BUCKET_ID = "your-bucket"
    
    # Add a custom category
    requests.post(
        f"https://api.mixpeek.com/v1/buckets/{BUCKET_ID}/objects",
        headers={
            "Authorization": f"Bearer {API_KEY}",
            "X-Namespace": NAMESPACE_ID,
        },
        json={
            "blobs": [{
                "data": "Electric vehicles, EV charging, battery technology, sustainable transport",
                "mime_type": "text/plain",
                "metadata": {
                    "iab_category_name": "Electric Vehicles",
                    "iab_category_id": "CUSTOM-EV-001",
                    "iab_tier": 2,
                    "iab_path": ["Automotive", "Electric Vehicles"],
                    "iab_parent_id": "IAB2"
                }
            }]
        }
    )
    

    After uploading, trigger the collection to generate embeddings. Your custom categories now live alongside the standard IAB taxonomy in the same vector space.

    2. Add an LLM Validation Stage

    For higher accuracy, add an llm_enrich stage after the vector search. This uses a language model (e.g., Gemini 2.5 Flash) to validate and re-score the top vector search results:

    # Add a validation stage to your retriever
    stages = [
        # Stage 1: Vector search (same as above)
        {"stage_name": "multimodal_search", "config": {"stage_id": "feature_search", ...}},
    
        # Stage 2: LLM validation
        {
            "stage_name": "validate_classification",
            "config": {
                "stage_id": "llm_enrich",
                "parameters": {
                    "provider": "google",
                    "model": "gemini-2.5-flash",
                    "prompt": "Given the content "{{INPUT.query}}", score how relevant the IAB category "{{DOC.iab_category_name}}" (path: {{DOC.iab_path}}) is on a scale of 0-100.",
                    "output_field": "classification",
                    "output_schema": {
                        "relevance_score": "number (0-100)",
                        "confidence": "high | medium | low"
                    }
                }
            }
        }
    ]
    

    This two-stage approach gives you the speed of vector search with the reasoning capability of an LLM — fast initial retrieval, then precise validation.

    Two-Stage Classification Pipeline vector search retrieval → LLM validation & re-scoring STAGE 1 · RETRIEVAL STAGE 2 · VALIDATION Input Content text · image · video Vector Search Semantic nearest-neighbor retrieval FAST BROAD ~50ms Top 20 candidates Pro Basketball 0.849 Basketball 0.838 Sports · Fitness · College Basketball ··· 20→ →10 LLM Validation Re-score with Gemini 2.5 Flash PRECISE NARROW ~250ms Re-scored top 10 Pro Basketball 92/100 HIGH Basketball 85/100 HIGH Fitness & Exercise 18/100 LOW Final Output validated categories + confidence scores ⚡ Fast retrieval ~50ms · top 20 🎯 LLM re-scoring ~250ms · top 10 = Speed + accuracy combined ~300ms total · high confidence

    3. Integrate with Prebid.js for Real-Time Bidding

    The classifier outputs are already structured for OpenRTB 2.6, making it straightforward to plug into Prebid.js header bidding workflows:

    // Prebid.js RTD module configuration
    pbjs.setConfig({
        realTimeData: {
            dataProviders: [{
                name: 'mixpeek',
                params: {
                    apiKey: 'your-public-key',
                    publicName: 'iab-contextual-classifier',
                    endpoint: 'https://api.mixpeek.com/v1/public/retrievers/iab-contextual-classifier/execute'
                }
            }]
        }
    });
    
    // The module automatically:
    // 1. Extracts page content
    // 2. Sends to the Mixpeek classifier
    // 3. Formats response as OpenRTB 2.6
    // 4. Attaches IAB categories to bid requests
    

    The OpenRTB output looks like this:

    {
        "site": {
            "content": {
                "data": [{
                    "id": "mixpeek.com",
                    "name": "Mixpeek Contextual",
                    "segment": [
                        {"id": "IAB17-26", "name": "Pro Basketball", "value": "0.849"},
                        {"id": "IAB17-3", "name": "Basketball", "value": "0.838"}
                    ]
                }]
            }
        }
    }
    

    SSPs receiving this bid request can match it against advertiser targeting rules, ensuring ads appear alongside relevant content — all without cookies or user tracking.


    Performance: Latency and Accuracy

    For real-time header bidding, classification must happen within the 200–2000ms auction window. Here's how the classifier performs:

    MetricValue
    Text classification latency337ms
    Multimodal (image) latency~500ms
    Accuracy (top-1 match)84.9%
    Taxonomy coverage700+ IAB 3.1 categories
    Input types supportedText, image, video

    The 337ms text latency is well within Prebid's typical 1.5–2s timeout window, leaving ample room for network overhead and bid processing.

    Classification Latency Comparison Text classification · p50 latency in milliseconds · lower is better 0 500ms 1,000ms 1,500ms ⚠ PREBID TIMEOUT ZONE ✓ SAFE WINDOW M Mixpeek multimodal IAB 3.1 337ms ⚡ Fastest G Google Cloud NLP text only ~500ms A AWS Comprehend text only ~450ms K Klazify text only ~800ms DIY Pipeline custom embed + classify 1.2–2.5s+ 337ms Text classification p50 Well within Prebid 1.5–2s window ~500ms Image classification No competitor supports this 700+ IAB 3.1 categories Full taxonomy coverage 84.9% Top-1 accuracy Semantic match, not keyword

    Getting Started in 60 Seconds

    You can try the classifier right now without any setup:

    1. Open mxp.co/r/iab-contextual-classifier
    2. Enter text, upload an image, or paste a URL
    3. See results — IAB categories ranked by confidence score

    To use it programmatically via API:

    curl -X POST https://api.mixpeek.com/v1/public/retrievers/iab-contextual-classifier/execute   -H "Content-Type: application/json"   -d '{
        "inputs": {
            "query": "Tesla announces record EV deliveries in Q4, stock surges 8%"
        }
      }'
    

    The public endpoint requires no API key for basic usage. For higher rate limits, custom categories, or LLM validation stages, create a free Mixpeek account.


    Why Build This as a Retriever?

    A design choice worth explaining: the IAB classifier is built as a Mixpeek retriever (a multi-stage query pipeline) rather than a standalone classification model. This matters because:

    • Composability: You can add stages (rerank, filter, LLM enrich) without retraining anything
    • Extensibility: Adding new categories means uploading new documents, not retraining a model
    • Multi-tenancy: Each customer can fork the base taxonomy and add their own categories in their own namespace
    • Versioning: Swap embedding models (e.g., upgrade from Vertex v1 to v2) by creating a new collection — zero downtime
    • Marketplace publishing: Any retriever can be published as a public tool with one API call

    This retriever-as-classifier pattern is powerful because classification is fundamentally a nearest-neighbor search in the right embedding space. Instead of training a custom model on labeled data, you encode your taxonomy as vectors and let semantic similarity do the work.


    What's Next

    We're actively improving the classifier. On the roadmap:

    • Brand safety signals: Flag content categories that common brand safety lists exclude (e.g., "Sensitive Social Issues", "Military Conflict")
    • Batch classification API: Classify thousands of URLs or creatives in a single request
    • Taxonomy versioning: Support IAB Content Taxonomy 4.0 when released, with backward-compatible category mapping
    • Custom fine-tuning: Bring your own labeled data to fine-tune the embedding space for your specific content vertical

    Try It Now