IAB Content Taxonomy Classifier API | Multimodal Contextual Targeting

Contextual advertising is having its moment. With third-party cookies disappearing and privacy regulations tightening, contextual targeting — classifying page and creative content into standardized categories — is the industry's answer to precise ad placement without tracking users.

But building a production-grade IAB content taxonomy classifier is harder than it looks. You need to handle 700+ categories across four hierarchical tiers, support text and visual content, and return results fast enough for real-time header bidding (under 2 seconds).

That's why we built the Mixpeek IAB Contextual Classifier — a free, multimodal classifier that maps any content (text, image, or video) to the IAB Content Taxonomy 3.1. It's published as a public retriever on the Mixpeek marketplace, and you can start using it right now with zero setup.

In this post, we'll break down:

How the classifier works under the hood
Why multimodal classification beats text-only approaches
How to extend and customize it for your own use cases
How it integrates with Prebid.js for real-time header bidding

What Is the IAB Content Taxonomy?

The IAB Content Taxonomy is the advertising industry's standard for categorizing digital content. Version 3.1 defines 700+ categories organized in a four-tier hierarchy:

Tier 1: 26 top-level categories (e.g., Sports, Technology & Computing, Arts & Entertainment)
Tier 2: ~366 subcategories (e.g., Sports → Basketball → NBA)
Tier 3–4: Granular sub-subcategories for fine-grained targeting

Publishers and SSPs use these categories to match advertiser campaigns with relevant content. For example, Nike might target IAB17-3 (Basketball) while a financial services company targets IAB13 (Personal Finance).

The problem? Most classification tools are either text-only (they can't classify a video ad or product image) or keyword-based (they match exact strings instead of understanding meaning). Let's look at how our approach solves both.

How It Works Under the Hood

The Mixpeek IAB Contextual Classifier uses a vector similarity search architecture rather than traditional rule-based or keyword matching. Here's the pipeline:

Step 1: Embed the IAB Taxonomy

We pre-compute multimodal embeddings (via Google Vertex AI's multimodal embedding model) for every one of the 700+ IAB categories. Each category's descriptive text — including its name, parent path, and representative keywords — gets encoded into a 1408-dimensional vector.

These vectors are stored in a Qdrant vector index, essentially creating a semantic map of the entire IAB taxonomy in embedding space.

# Simplified: how each IAB category becomes a vector
{
    "text": "Sports > Basketball > NBA",
    "metadata": {
        "iab_category_id": "IAB17-26",
        "iab_category_name": "Pro Basketball",
        "iab_tier": 2,
        "iab_path": ["Sports", "Basketball", "Pro Basketball"]
    }
}
# → Embedded via Vertex multimodal model → 1408D vector
# → Stored in Qdrant for nearest-neighbor search

Step 2: Embed the Input Content

When you submit content for classification — whether it's a text snippet, an image, or a video — the same Vertex multimodal model encodes it into the same 1408-dimensional space.

This is where multimodal comes in: a basketball game photo, a text article about the NBA Finals, and a highlight reel video all land in roughly the same region of embedding space. They're semantically close to each other and to the IAB "Pro Basketball" category vector, even though they're completely different media types.

Step 3: Reciprocal Rank Fusion (RRF) Search

The classifier runs two parallel searches and fuses the results:

Content search: Embeds the uploaded file (image/video) and finds the 20 nearest IAB category vectors
Text search: Embeds the text query and finds the 20 nearest IAB category vectors

These two result sets are merged using Reciprocal Rank Fusion (RRF), which combines ranking positions from both searches to produce a final top-10 list. RRF is particularly effective here because it surfaces categories that rank highly in both modalities, boosting confidence.

// Retriever stage configuration
{
    "stage_name": "multimodal_search",
    "config": {
        "stage_id": "feature_search",
        "parameters": {
            "searches": [
                {
                    "feature_uri": "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding",
                    "query": { "input_mode": "content", "value": "{{INPUT.query_content}}" },
                    "top_k": 20
                },
                {
                    "feature_uri": "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding",
                    "query": { "input_mode": "text", "value": "{{INPUT.query}}" },
                    "top_k": 20
                }
            ],
            "final_top_k": 10,
            "fusion": "rrf"
        }
    }
}

Step 4: Return Structured Results

Each result includes the full IAB taxonomy metadata plus a confidence score:

{
    "iab_category_name": "Pro Basketball",
    "iab_category_id": "IAB17-26",
    "iab_tier": 2,
    "iab_path": ["Sports", "Basketball", "Pro Basketball"],
    "iab_parent_id": "IAB17-3",
    "score": 0.8490
}

Why Multimodal Beats Text-Only Classification

Most contextual classification APIs — including Google Cloud NLP, AWS Comprehend, and Klazify — are text-only. They can classify an article, but what about:

A product image on an e-commerce page?
A pre-roll video ad served before YouTube content?
A thumbnail that tells you more about the content than the page title?
A social media post that's an image with no caption?

The Mixpeek classifier handles all of these because the Vertex multimodal embedding model encodes text, images, and video into a shared semantic space. A photo of a basketball game, the text "Lakers vs. Celtics NBA Finals", and a highlight clip all map to vectors near IAB17-26: Pro Basketball.

This matters for real-world ad tech because:

Scenario	Text-Only	Multimodal (Mixpeek)
Article with text	✅ Works	✅ Works
Image-heavy page (Pinterest, Instagram)	❌ No text to classify	✅ Classifies images directly
Video content (YouTube, TikTok)	❌ Requires transcription first	✅ Classifies video frames directly
Mixed media (article + images + video)	⚠️ Partial (text only)	✅ All modalities contribute
Non-English visual content	❌ Text extraction unreliable	✅ Visual understanding is language-agnostic

How to Extend the Classifier for Your Use Case

The published classifier at mxp.co/r/iab-contextual-classifier works out of the box — but the real power comes when you use it as a starting point and customize it. Here are three ways to extend it.

1. Add Custom Categories

The IAB taxonomy covers most ad tech needs, but you might have industry-specific categories. Since the classifier is backed by a Mixpeek collection (a processing pipeline tied to a vector index), you can add your own category documents to the same bucket:

import requests

API_KEY = "your-api-key"
NAMESPACE_ID = "your-namespace"
BUCKET_ID = "your-bucket"

# Add a custom category
requests.post(
    f"https://api.mixpeek.com/v1/buckets/{BUCKET_ID}/objects",
    headers={
        "Authorization": f"Bearer {API_KEY}",
        "X-Namespace": NAMESPACE_ID,
    },
    json={
        "blobs": [{
            "data": "Electric vehicles, EV charging, battery technology, sustainable transport",
            "mime_type": "text/plain",
            "metadata": {
                "iab_category_name": "Electric Vehicles",
                "iab_category_id": "CUSTOM-EV-001",
                "iab_tier": 2,
                "iab_path": ["Automotive", "Electric Vehicles"],
                "iab_parent_id": "IAB2"
            }
        }]
    }
)

After uploading, trigger the collection to generate embeddings. Your custom categories now live alongside the standard IAB taxonomy in the same vector space.

2. Add an LLM Validation Stage

For higher accuracy, add an llm_enrich stage after the vector search. This uses a language model (e.g., Gemini 2.5 Flash) to validate and re-score the top vector search results:

# Add a validation stage to your retriever
stages = [
    # Stage 1: Vector search (same as above)
    {"stage_name": "multimodal_search", "config": {"stage_id": "feature_search", ...}},

    # Stage 2: LLM validation
    {
        "stage_name": "validate_classification",
        "config": {
            "stage_id": "llm_enrich",
            "parameters": {
                "provider": "google",
                "model": "gemini-2.5-flash",
                "prompt": "Given the content "{{INPUT.query}}", score how relevant the IAB category "{{DOC.iab_category_name}}" (path: {{DOC.iab_path}}) is on a scale of 0-100.",
                "output_field": "classification",
                "output_schema": {
                    "relevance_score": "number (0-100)",
                    "confidence": "high | medium | low"
                }
            }
        }
    }
]

This two-stage approach gives you the speed of vector search with the reasoning capability of an LLM — fast initial retrieval, then precise validation.

3. Integrate with Prebid.js for Real-Time Bidding

The classifier outputs are already structured for OpenRTB 2.6, making it straightforward to plug into Prebid.js header bidding workflows:

// Prebid.js RTD module configuration
pbjs.setConfig({
    realTimeData: {
        dataProviders: [{
            name: 'mixpeek',
            params: {
                apiKey: 'your-public-key',
                publicName: 'iab-contextual-classifier',
                endpoint: 'https://api.mixpeek.com/v1/public/retrievers/iab-contextual-classifier/execute'
            }
        }]
    }
});

// The module automatically:
// 1. Extracts page content
// 2. Sends to the Mixpeek classifier
// 3. Formats response as OpenRTB 2.6
// 4. Attaches IAB categories to bid requests

The OpenRTB output looks like this:

{
    "site": {
        "content": {
            "data": [{
                "id": "mixpeek.com",
                "name": "Mixpeek Contextual",
                "segment": [
                    {"id": "IAB17-26", "name": "Pro Basketball", "value": "0.849"},
                    {"id": "IAB17-3", "name": "Basketball", "value": "0.838"}
                ]
            }]
        }
    }
}

SSPs receiving this bid request can match it against advertiser targeting rules, ensuring ads appear alongside relevant content — all without cookies or user tracking.

Performance: Latency and Accuracy

For real-time header bidding, classification must happen within the 200–2000ms auction window. Here's how the classifier performs:

Metric	Value
Text classification latency	337ms
Multimodal (image) latency	~500ms
Accuracy (top-1 match)	84.9%
Taxonomy coverage	700+ IAB 3.1 categories
Input types supported	Text, image, video

The 337ms text latency is well within Prebid's typical 1.5–2s timeout window, leaving ample room for network overhead and bid processing.

Getting Started in 60 Seconds

You can try the classifier right now without any setup:

Open mxp.co/r/iab-contextual-classifier
Enter text, upload an image, or paste a URL
See results — IAB categories ranked by confidence score

To use it programmatically via API:

curl -X POST https://api.mixpeek.com/v1/public/retrievers/iab-contextual-classifier/execute   -H "Content-Type: application/json"   -d '{
    "inputs": {
        "query": "Tesla announces record EV deliveries in Q4, stock surges 8%"
    }
  }'

The public endpoint requires no API key for basic usage. For higher rate limits, custom categories, or LLM validation stages, create a free Mixpeek account.

Why Build This as a Retriever?

A design choice worth explaining: the IAB classifier is built as a Mixpeek retriever (a multi-stage query pipeline) rather than a standalone classification model. This matters because:

Composability: You can add stages (rerank, filter, LLM enrich) without retraining anything
Extensibility: Adding new categories means uploading new documents, not retraining a model
Multi-tenancy: Each customer can fork the base taxonomy and add their own categories in their own namespace
Versioning: Swap embedding models (e.g., upgrade from Vertex v1 to v2) by creating a new collection — zero downtime
Marketplace publishing: Any retriever can be published as a public tool with one API call

This retriever-as-classifier pattern is powerful because classification is fundamentally a nearest-neighbor search in the right embedding space. Instead of training a custom model on labeled data, you encode your taxonomy as vectors and let semantic similarity do the work.

What's Next

We're actively improving the classifier. On the roadmap:

Brand safety signals: Flag content categories that common brand safety lists exclude (e.g., "Sensitive Social Issues", "Military Conflict")
Batch classification API: Classify thousands of URLs or creatives in a single request
Taxonomy versioning: Support IAB Content Taxonomy 4.0 when released, with backward-compatible category mapping
Custom fine-tuning: Bring your own labeled data to fine-tune the embedding space for your specific content vertical

IAB Contextual Classifier: Taxonomies for Videos and Images