Cross-Media

Multimodal RAG

Retrieval + grouping + LLM generation. The recipe is the retriever config—not the LLM. Enables citations back to source timestamps.

video

image

text

audio

Multi-Stage

67.0K runs

Deploy Recipe

"How did the product launch go? Cite specific video clips and document timestamps"

Why This Matters

RAG is just retrieval + external LLM. Mixpeek handles the retrieval infrastructure, you bring the generation model.

from mixpeek import Mixpeek
from openai import OpenAI

mixpeek = Mixpeek(api_key="your-mixpeek-key")
openai = OpenAI(api_key="your-openai-key")

# Retrieve context with citations
context = mixpeek.retrievers.execute(
    retriever_id="rag-retriever",
    inputs={
        "query_text": "How did the product launch go?",
        "return_citations": True
    },
    limit=5
)

# Format context with sources
context_str = "\n".join([
    f"[{i+1}] {doc['text']} (Source: {doc['source_url']} @ {doc['timestamp']})"
    for i, doc in enumerate(context['documents'])
])

# Generate with LLM
response = openai.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": f"Context:\n{context_str}"},
        {"role": "user", "content": "Summarize the product launch feedback"}
    ]
)

Retrieval Flow

feature search

Semantic search for relevant context

llm rerank

Rerank by relevance to query

limit

Top-k most relevant chunks

feature search(search)

Semantic search for relevant context

llm rerank(rank)

Rerank by relevance to query

limit(reduce)

Top-k most relevant chunks

Feature Extractors

Text Embedding Image Embedding Video Embedding

Feature Extractors

Text Embedding

Extract semantic embeddings from documents, transcripts and text content

827K runs

Image Embedding

Generate visual embeddings for similarity search and clustering

752K runs

Video Embedding

Generate vector embeddings for video content

610K runs

Retriever Stages

feature search

Search collections using multimodal embeddings

llm rerank

Rerank documents using LLM-based relevance scoring

rank

limit

Limit the number of documents returned

reduce

Documentation

LLM Rerank Stage

Multimodal RAG

Why This Matters

Retrieval Flow

Feature Extractors

Feature Extractors

Retriever Stages

Documentation

Related Recipes & Resources

Video Embedding

Text Embedding

Image Embedding

Cross-Modal Join

Semantic Multimodal Retrieval

Unsupervised Clustering & Theme Discovery