Semantic Join - A cross-collection enrichment operation that attaches context from one collection to results from another, using semantic similarity as the join key.
A semantic join is the multimodal equivalent of a SQL JOIN. In structured databases, JOINs combine rows from different tables using foreign keys. In a multimodal data warehouse, enrich stages combine results from different collections using embedding similarity or document relationships. This enables cross-referencing without pre-defined foreign keys.
How It Works
After a retrieval pipeline produces results from one collection (e.g., media library search), an enrich stage queries a second collection (e.g., brand safety scores) to attach contextual data to each result. The join can be by document ID, semantic similarity, or metadata matching.
Examples
Search media library for celebrity appearances → enrich with brand safety scores from a separate collection
Find similar products → enrich with pricing and availability from a catalog collection
Detect copyrighted audio → enrich with licensing terms from a rights database
Find relevant document passages → enrich with author and classification metadata
Best Practices
Use enrich stages after reduce stages to minimize the number of cross-collection lookups
Keep enrichment collections focused: one collection per enrichment type (brand scores, rights, metadata)
Use semantic joins for fuzzy matching and document_enrich for exact ID-based joins