Cross-MediaSimilarConcepts

Semantic Join

Bridge extracted content features with business reference data. Join video clips to product catalogs, detected faces to employee directories, or documents to compliance frameworks—all via embedding similarity.

video

image

text

audio

Multi-Stage

45.0K runs

Deploy Recipe

"Find marketing videos featuring products from our electronics catalog with matched SKUs"

Why This Matters

Better search isn't about better embeddings—it's about connecting extracted content to existing business systems. Query by product taxonomy, not embedding distance.

import requests

API_URL = "https://api.mixpeek.com"
headers = {"Authorization": "Bearer YOUR_API_KEY", "X-Namespace": "your-namespace"}

# Create taxonomy to join video content with product catalog
taxonomy = requests.post(f"{API_URL}/v1/taxonomies", headers=headers, json={
    "taxonomy_name": "product_matcher",
    "taxonomy_type": "flat",
    "retriever_id": "ret_product_search",
    "input_mappings": {
        "query_embedding": "mixpeek://multimodal_extractor@v1/embedding"
    },
    "source_collection": {
        "collection_id": "col_product_catalog",
        "enrichment_fields": [
            {"field_path": "metadata.sku", "merge_mode": "enrich"},
            {"field_path": "metadata.category", "merge_mode": "enrich"}
        ]
    }
}).json()

# Apply taxonomy to video collection (semantic join)
requests.post(
    f"{API_URL}/v1/collections/col_marketing_videos/apply-taxonomy",
    headers=headers,
    json={"taxonomy_id": taxonomy["taxonomy_id"]}
)

# Search videos - results now include matched product data
results = requests.post(
    f"{API_URL}/v1/retrievers/video-search/execute",
    headers=headers,
    json={"query": {"text": "product demos"}}
).json()

for doc in results["documents"]:
    print(f"Video: {doc['document_id']}")
    print(f"  Matched SKU: {doc.get('metadata.sku', 'N/A')}")
    print(f"  Category: {doc.get('metadata.category', 'N/A')}")

Feature Extractors

Image Embedding

Generate visual embeddings for similarity search and clustering

752K runs

Video Embedding

Generate vector embeddings for video content

610K runs

Text Embedding

Extract semantic embeddings from documents, transcripts and text content

827K runs

Retriever Stages

feature search

Search and filter documents by vector similarity using feature embeddings

filter

attribute filter

Filter documents by metadata attribute values using boolean logic

filter

Resources Used

Taxonomy

Product Catalog

Business reference data for semantic joining

Documentation

Taxonomies Retrievers

Semantic Join

Why This Matters

Feature Extractors

Retriever Stages

Resources Used

Related Blog Posts

Documentation

Related Recipes & Resources

Video Embedding

Text Embedding

Image Embedding

Video Embedding

Face Embedding

Image Embedding