IAB Contextual Classifier: Taxonomies for Videos and Images
Classify text, images, and video into 700+ IAB Content Taxonomy categories using multimodal AI. Learn how it works under the hood and how to extend it for your contextual targeting needs.

Contextual advertising is having its moment. With third-party cookies disappearing and privacy regulations tightening, contextual targeting — classifying page and creative content into standardized categories — is the industry's answer to precise ad placement without tracking users.
But building a production-grade IAB content taxonomy classifier is harder than it looks. You need to handle 700+ categories across four hierarchical tiers, support text and visual content, and return results fast enough for real-time header bidding (under 2 seconds).
That's why we built the Mixpeek IAB Contextual Classifier — a free, multimodal classifier that maps any content (text, image, or video) to the IAB Content Taxonomy 3.1. It's published as a public retriever on the Mixpeek marketplace, and you can start using it right now with zero setup.
In this post, we'll break down:
- How the classifier works under the hood
- Why multimodal classification beats text-only approaches
- How to extend and customize it for your own use cases
- How it integrates with Prebid.js for real-time header bidding
What Is the IAB Content Taxonomy?
The IAB Content Taxonomy is the advertising industry's standard for categorizing digital content. Version 3.1 defines 700+ categories organized in a four-tier hierarchy:
- Tier 1: 26 top-level categories (e.g., Sports, Technology & Computing, Arts & Entertainment)
- Tier 2: ~366 subcategories (e.g., Sports → Basketball → NBA)
- Tier 3–4: Granular sub-subcategories for fine-grained targeting
Publishers and SSPs use these categories to match advertiser campaigns with relevant content. For example, Nike might target IAB17-3 (Basketball) while a financial services company targets IAB13 (Personal Finance).
The problem? Most classification tools are either text-only (they can't classify a video ad or product image) or keyword-based (they match exact strings instead of understanding meaning). Let's look at how our approach solves both.
How It Works Under the Hood
The Mixpeek IAB Contextual Classifier uses a vector similarity search architecture rather than traditional rule-based or keyword matching. Here's the pipeline:
Step 1: Embed the IAB Taxonomy
We pre-compute multimodal embeddings (via Google Vertex AI's multimodal embedding model) for every one of the 700+ IAB categories. Each category's descriptive text — including its name, parent path, and representative keywords — gets encoded into a 1408-dimensional vector.
These vectors are stored in a Qdrant vector index, essentially creating a semantic map of the entire IAB taxonomy in embedding space.
# Simplified: how each IAB category becomes a vector
{
"text": "Sports > Basketball > NBA",
"metadata": {
"iab_category_id": "IAB17-26",
"iab_category_name": "Pro Basketball",
"iab_tier": 2,
"iab_path": ["Sports", "Basketball", "Pro Basketball"]
}
}
# → Embedded via Vertex multimodal model → 1408D vector
# → Stored in Qdrant for nearest-neighbor search
Step 2: Embed the Input Content
When you submit content for classification — whether it's a text snippet, an image, or a video — the same Vertex multimodal model encodes it into the same 1408-dimensional space.
This is where multimodal comes in: a basketball game photo, a text article about the NBA Finals, and a highlight reel video all land in roughly the same region of embedding space. They're semantically close to each other and to the IAB "Pro Basketball" category vector, even though they're completely different media types.
Step 3: Reciprocal Rank Fusion (RRF) Search
The classifier runs two parallel searches and fuses the results:
- Content search: Embeds the uploaded file (image/video) and finds the 20 nearest IAB category vectors
- Text search: Embeds the text query and finds the 20 nearest IAB category vectors
These two result sets are merged using Reciprocal Rank Fusion (RRF), which combines ranking positions from both searches to produce a final top-10 list. RRF is particularly effective here because it surfaces categories that rank highly in both modalities, boosting confidence.
// Retriever stage configuration
{
"stage_name": "multimodal_search",
"config": {
"stage_id": "feature_search",
"parameters": {
"searches": [
{
"feature_uri": "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding",
"query": { "input_mode": "content", "value": "{{INPUT.query_content}}" },
"top_k": 20
},
{
"feature_uri": "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding",
"query": { "input_mode": "text", "value": "{{INPUT.query}}" },
"top_k": 20
}
],
"final_top_k": 10,
"fusion": "rrf"
}
}
}
Step 4: Return Structured Results
Each result includes the full IAB taxonomy metadata plus a confidence score:
{
"iab_category_name": "Pro Basketball",
"iab_category_id": "IAB17-26",
"iab_tier": 2,
"iab_path": ["Sports", "Basketball", "Pro Basketball"],
"iab_parent_id": "IAB17-3",
"score": 0.8490
}

Why Multimodal Beats Text-Only Classification
Most contextual classification APIs — including Google Cloud NLP, AWS Comprehend, and Klazify — are text-only. They can classify an article, but what about:
- A product image on an e-commerce page?
- A pre-roll video ad served before YouTube content?
- A thumbnail that tells you more about the content than the page title?
- A social media post that's an image with no caption?
The Mixpeek classifier handles all of these because the Vertex multimodal embedding model encodes text, images, and video into a shared semantic space. A photo of a basketball game, the text "Lakers vs. Celtics NBA Finals", and a highlight clip all map to vectors near IAB17-26: Pro Basketball.
This matters for real-world ad tech because:
| Scenario | Text-Only | Multimodal (Mixpeek) |
|---|---|---|
| Article with text | ✅ Works | ✅ Works |
| Image-heavy page (Pinterest, Instagram) | ❌ No text to classify | ✅ Classifies images directly |
| Video content (YouTube, TikTok) | ❌ Requires transcription first | ✅ Classifies video frames directly |
| Mixed media (article + images + video) | ⚠️ Partial (text only) | ✅ All modalities contribute |
| Non-English visual content | ❌ Text extraction unreliable | ✅ Visual understanding is language-agnostic |
How to Extend the Classifier for Your Use Case
The published classifier at mxp.co/r/iab-contextual-classifier works out of the box — but the real power comes when you use it as a starting point and customize it. Here are three ways to extend it.
1. Add Custom Categories
The IAB taxonomy covers most ad tech needs, but you might have industry-specific categories. Since the classifier is backed by a Mixpeek collection (a processing pipeline tied to a vector index), you can add your own category documents to the same bucket:
import requests
API_KEY = "your-api-key"
NAMESPACE_ID = "your-namespace"
BUCKET_ID = "your-bucket"
# Add a custom category
requests.post(
f"https://api.mixpeek.com/v1/buckets/{BUCKET_ID}/objects",
headers={
"Authorization": f"Bearer {API_KEY}",
"X-Namespace": NAMESPACE_ID,
},
json={
"blobs": [{
"data": "Electric vehicles, EV charging, battery technology, sustainable transport",
"mime_type": "text/plain",
"metadata": {
"iab_category_name": "Electric Vehicles",
"iab_category_id": "CUSTOM-EV-001",
"iab_tier": 2,
"iab_path": ["Automotive", "Electric Vehicles"],
"iab_parent_id": "IAB2"
}
}]
}
)
After uploading, trigger the collection to generate embeddings. Your custom categories now live alongside the standard IAB taxonomy in the same vector space.
2. Add an LLM Validation Stage
For higher accuracy, add an llm_enrich stage after the vector search. This uses a language model (e.g., Gemini 2.5 Flash) to validate and re-score the top vector search results:
# Add a validation stage to your retriever
stages = [
# Stage 1: Vector search (same as above)
{"stage_name": "multimodal_search", "config": {"stage_id": "feature_search", ...}},
# Stage 2: LLM validation
{
"stage_name": "validate_classification",
"config": {
"stage_id": "llm_enrich",
"parameters": {
"provider": "google",
"model": "gemini-2.5-flash",
"prompt": "Given the content "{{INPUT.query}}", score how relevant the IAB category "{{DOC.iab_category_name}}" (path: {{DOC.iab_path}}) is on a scale of 0-100.",
"output_field": "classification",
"output_schema": {
"relevance_score": "number (0-100)",
"confidence": "high | medium | low"
}
}
}
}
]
This two-stage approach gives you the speed of vector search with the reasoning capability of an LLM — fast initial retrieval, then precise validation.
3. Integrate with Prebid.js for Real-Time Bidding
The classifier outputs are already structured for OpenRTB 2.6, making it straightforward to plug into Prebid.js header bidding workflows:
// Prebid.js RTD module configuration
pbjs.setConfig({
realTimeData: {
dataProviders: [{
name: 'mixpeek',
params: {
apiKey: 'your-public-key',
publicName: 'iab-contextual-classifier',
endpoint: 'https://api.mixpeek.com/v1/public/retrievers/iab-contextual-classifier/execute'
}
}]
}
});
// The module automatically:
// 1. Extracts page content
// 2. Sends to the Mixpeek classifier
// 3. Formats response as OpenRTB 2.6
// 4. Attaches IAB categories to bid requests
The OpenRTB output looks like this:
{
"site": {
"content": {
"data": [{
"id": "mixpeek.com",
"name": "Mixpeek Contextual",
"segment": [
{"id": "IAB17-26", "name": "Pro Basketball", "value": "0.849"},
{"id": "IAB17-3", "name": "Basketball", "value": "0.838"}
]
}]
}
}
}
SSPs receiving this bid request can match it against advertiser targeting rules, ensuring ads appear alongside relevant content — all without cookies or user tracking.
Performance: Latency and Accuracy
For real-time header bidding, classification must happen within the 200–2000ms auction window. Here's how the classifier performs:
| Metric | Value |
|---|---|
| Text classification latency | 337ms |
| Multimodal (image) latency | ~500ms |
| Accuracy (top-1 match) | 84.9% |
| Taxonomy coverage | 700+ IAB 3.1 categories |
| Input types supported | Text, image, video |
The 337ms text latency is well within Prebid's typical 1.5–2s timeout window, leaving ample room for network overhead and bid processing.
Getting Started in 60 Seconds
You can try the classifier right now without any setup:
- Open mxp.co/r/iab-contextual-classifier
- Enter text, upload an image, or paste a URL
- See results — IAB categories ranked by confidence score
To use it programmatically via API:
curl -X POST https://api.mixpeek.com/v1/public/retrievers/iab-contextual-classifier/execute -H "Content-Type: application/json" -d '{
"inputs": {
"query": "Tesla announces record EV deliveries in Q4, stock surges 8%"
}
}'
The public endpoint requires no API key for basic usage. For higher rate limits, custom categories, or LLM validation stages, create a free Mixpeek account.
Why Build This as a Retriever?
A design choice worth explaining: the IAB classifier is built as a Mixpeek retriever (a multi-stage query pipeline) rather than a standalone classification model. This matters because:
- Composability: You can add stages (rerank, filter, LLM enrich) without retraining anything
- Extensibility: Adding new categories means uploading new documents, not retraining a model
- Multi-tenancy: Each customer can fork the base taxonomy and add their own categories in their own namespace
- Versioning: Swap embedding models (e.g., upgrade from Vertex v1 to v2) by creating a new collection — zero downtime
- Marketplace publishing: Any retriever can be published as a public tool with one API call
This retriever-as-classifier pattern is powerful because classification is fundamentally a nearest-neighbor search in the right embedding space. Instead of training a custom model on labeled data, you encode your taxonomy as vectors and let semantic similarity do the work.
What's Next
We're actively improving the classifier. On the roadmap:
- Brand safety signals: Flag content categories that common brand safety lists exclude (e.g., "Sensitive Social Issues", "Military Conflict")
- Batch classification API: Classify thousands of URLs or creatives in a single request
- Taxonomy versioning: Support IAB Content Taxonomy 4.0 when released, with backward-compatible category mapping
- Custom fine-tuning: Bring your own labeled data to fine-tune the embedding space for your specific content vertical
