Mixpeek Logo
    Login / Signup

    What is Semantic Join

    Semantic Join - A cross-collection enrichment operation that attaches context from one collection to results from another, using semantic similarity as the join key.

    A semantic join is the multimodal equivalent of a SQL JOIN. In structured databases, JOINs combine rows from different tables using foreign keys. In a multimodal data warehouse, enrich stages combine results from different collections using embedding similarity or document relationships. This enables cross-referencing without pre-defined foreign keys.

    How It Works

    After a retrieval pipeline produces results from one collection (e.g., media library search), an enrich stage queries a second collection (e.g., brand safety scores) to attach contextual data to each result. The join can be by document ID, semantic similarity, or metadata matching.

    Examples

    • Search media library for celebrity appearances → enrich with brand safety scores from a separate collection
    • Find similar products → enrich with pricing and availability from a catalog collection
    • Detect copyrighted audio → enrich with licensing terms from a rights database
    • Find relevant document passages → enrich with author and classification metadata

    Best Practices

    • Use enrich stages after reduce stages to minimize the number of cross-collection lookups
    • Keep enrichment collections focused: one collection per enrichment type (brand scores, rights, metadata)
    • Use semantic joins for fuzzy matching and document_enrich for exact ID-based joins

    Related Pages

    • Document Enrich stage: /docs/retrieval/stages/document-enrich
    • Retrieval Cookbook: /docs/retrieval/cookbook
    • Blog: Multi-Stage Retrieval Pipelines - /blog/multi-stage-retrieval-pipelines