Mixpeek Logo
    Cross-MediaSimilar

    Multimodal RAG

    Retrieval-augmented generation across video, images, and text. Retrieve relevant multimodal context, then pass to your LLM with citations back to source timestamps and frames.

    video
    image
    text
    audio
    Multi-Stage
    67.0K runs
    Deploy Recipe

    "How did the go? specific video clips and document "

    Why This Matters

    RAG quality depends on retrieval quality. Mixpeek handles the multimodal retrieval infrastructure while you bring your preferred generation model.

    import requests
    from openai import OpenAI
    API_URL = "https://api.mixpeek.com"
    headers = {"Authorization": "Bearer YOUR_API_KEY", "X-Namespace": "your-namespace"}
    openai = OpenAI(api_key="your-openai-key")
    # Retrieve multimodal context with citations
    results = requests.post(
    f"{API_URL}/v1/retrievers/rag-retriever/execute",
    headers=headers,
    json={"query": {"text": "How did the product launch go?"}}
    ).json()
    # Format context with source citations
    context_str = "\n".join([
    f"[{i+1}] {doc['text']} (Source: {doc['root_object_id']} @ {doc['start_time']}s)"
    for i, doc in enumerate(results["documents"])
    ])
    # Generate with your preferred LLM
    response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[
    {"role": "system", "content": f"Answer based on this context:\n{context_str}"},
    {"role": "user", "content": "Summarize the product launch feedback with citations"}
    ]
    )
    print(response.choices[0].message.content)

    Feature Extractors

    Text Embedding

    Extract semantic embeddings from documents, transcripts and text content

    827K runs

    Image Embedding

    Generate visual embeddings for similarity search and clustering

    752K runs

    Video Embedding

    Generate vector embeddings for video content

    610K runs

    Audio Transcription

    Transcribe audio content to text

    450K runs

    Retriever Stages

    feature search

    Search and filter documents by vector similarity using feature embeddings

    filter