Cross-Media

Cross-Modal Join

Join collections across shared embedding spaces or time overlap. Enables investigations, analytics, and multi-source RAG.

video

image

text

Multi-Stage

45.0K runs

Deploy Recipe

"Find security incidents in December 2024 by joining video footage with incident logs"

Why This Matters

Joins are infrastructure operations—not ML models. Once collections share embedding spaces, you can query across them.

from mixpeek import Mixpeek

client = Mixpeek(api_key="your-api-key")

# Create collections with shared embedding space
video_collection = client.collections.create(
    collection_name="video_library",
    feature_extractor={
        "feature_extractor_name": "multimodal_extractor",
        "version": "v1"
    }
)

transcript_collection = client.collections.create(
    collection_name="transcripts",
    feature_extractor={
        "feature_extractor_name": "text_extractor",
        "version": "v1"
    }
)

# Cross-modal search across both
results = client.retrievers.execute(
    retriever_id="cross-modal-retriever",
    inputs={
        "query_text": "security incident",
        "collections": ["video_library", "transcripts"],
        "time_range": {
            "start": "2024-12-01T00:00:00Z",
            "end": "2024-12-31T23:59:59Z"
        }
    }
)

Retrieval Flow

feature search

Search across multiple collections

attribute filter

Time-based overlap filtering

compose

Merge results from multiple collections

feature search(search)

Search across multiple collections

attribute filter(filter)

Time-based overlap filtering

compose(compose)

Merge results from multiple collections

Feature Extractors

Text Embedding Image Embedding Video Embedding

Feature Extractors

Text Embedding

Extract semantic embeddings from documents, transcripts and text content

827K runs

Image Embedding

Generate visual embeddings for similarity search and clustering

752K runs

Video Embedding

Generate vector embeddings for video content

610K runs

Retriever Stages

feature search

Search collections using multimodal embeddings

attribute filter

Filter documents by metadata attributes

filter

compose

Compose multiple retriever pipelines together

compose

Documentation

Compose Stage

Cross-Modal Join

Why This Matters

Retrieval Flow

Feature Extractors

Feature Extractors

Retriever Stages

Documentation

Related Recipes & Resources

Video Embedding

Text Embedding

Image Embedding

Multimodal RAG

Hierarchical Taxonomy Classification

Dataset Audit & Drift Detection