Mixpeek Logo
    Cross-MediaSimilarConcepts

    Semantic Join

    Bridge extracted content features with business reference data. Join video clips to product catalogs, detected faces to employee directories, or documents to compliance frameworks—all via embedding similarity.

    video
    image
    text
    audio
    Multi-Stage
    45.0K runs
    Deploy Recipe

    "Find from our with "

    Why This Matters

    Better search isn't about better embeddings—it's about connecting extracted content to existing business systems. Query by product taxonomy, not embedding distance.

    import requests
    API_URL = "https://api.mixpeek.com"
    headers = {"Authorization": "Bearer YOUR_API_KEY", "X-Namespace": "your-namespace"}
    # Create taxonomy to join video content with product catalog
    taxonomy = requests.post(f"{API_URL}/v1/taxonomies", headers=headers, json={
    "taxonomy_name": "product_matcher",
    "taxonomy_type": "flat",
    "retriever_id": "ret_product_search",
    "input_mappings": {
    "query_embedding": "mixpeek://multimodal_extractor@v1/embedding"
    },
    "source_collection": {
    "collection_id": "col_product_catalog",
    "enrichment_fields": [
    {"field_path": "metadata.sku", "merge_mode": "enrich"},
    {"field_path": "metadata.category", "merge_mode": "enrich"}
    ]
    }
    }).json()
    # Apply taxonomy to video collection (semantic join)
    requests.post(
    f"{API_URL}/v1/collections/col_marketing_videos/apply-taxonomy",
    headers=headers,
    json={"taxonomy_id": taxonomy["taxonomy_id"]}
    )
    # Search videos - results now include matched product data
    results = requests.post(
    f"{API_URL}/v1/retrievers/video-search/execute",
    headers=headers,
    json={"query": {"text": "product demos"}}
    ).json()
    for doc in results["documents"]:
    print(f"Video: {doc['document_id']}")
    print(f" Matched SKU: {doc.get('metadata.sku', 'N/A')}")
    print(f" Category: {doc.get('metadata.category', 'N/A')}")

    Feature Extractors

    Image Embedding

    Generate visual embeddings for similarity search and clustering

    752K runs

    Video Embedding

    Generate vector embeddings for video content

    610K runs

    Text Embedding

    Extract semantic embeddings from documents, transcripts and text content

    827K runs

    Retriever Stages

    feature search

    Search collections using multimodal embeddings

    search

    compose

    Compose multiple retriever pipelines together

    compose

    attribute filter

    Filter documents by metadata attributes

    filter

    Related Blog Posts