Mixpeek Logo

    Video RAG Pipeline

    Retrieval-augmented generation specifically designed for video content. Decomposes videos into scenes and transcripts, retrieves relevant segments for a given question, and passes them as context to an LLM with precise timestamp citations.

    video
    text
    audio
    Multi-Stage
    2.6K runs
    Deploy Recipe
    from mixpeek import Mixpeek
    from openai import OpenAI
    client = Mixpeek(api_key="YOUR_API_KEY")
    openai = OpenAI(api_key="YOUR_OPENAI_KEY")
    # Create video collection with scene + transcript extraction
    collection = client.collections.create(
    namespace_id="ns_your_namespace",
    name="training_videos",
    extractors=["multimodal-extractor", "text-extractor"]
    )
    # Retrieve relevant video segments
    results = client.retrievers.execute(
    retriever_id="ret_video_rag",
    query={"text": "How do I configure the firewall settings?"}
    )
    # Build context with video citations
    context = "\n".join([
    f"[{i+1}] {doc['text']} (Video: {doc['root_object_id']}, {doc['start_time']:.1f}s-{doc['end_time']:.1f}s)"
    for i, doc in enumerate(results["results"])
    ])
    # Generate answer with citations
    response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[
    {"role": "system", "content": f"Answer using this video context:\n{context}"},
    {"role": "user", "content": "How do I configure the firewall settings?"}
    ]
    )
    print(response.choices[0].message.content)

    Feature Extractors

    Retriever Stages

    rerank

    Rerank documents using cross-encoder models for accurate relevance

    sort

    summarize

    Condense multiple documents into a summary using an LLM

    reduce

    Related Recipes & Resources

    Explore these related resources to deepen your understanding and discover more powerful features