Video RAG Pipeline
Retrieval-augmented generation specifically designed for video content. Decomposes videos into scenes and transcripts, retrieves relevant segments for a given question, and passes them as context to an LLM with precise timestamp citations.
from mixpeek import Mixpeekfrom openai import OpenAIclient = Mixpeek(api_key="YOUR_API_KEY")openai = OpenAI(api_key="YOUR_OPENAI_KEY")# Create video collection with scene + transcript extractioncollection = client.collections.create(namespace_id="ns_your_namespace",name="training_videos",extractors=["multimodal-extractor", "text-extractor"])# Retrieve relevant video segmentsresults = client.retrievers.execute(retriever_id="ret_video_rag",query={"text": "How do I configure the firewall settings?"})# Build context with video citationscontext = "\n".join([f"[{i+1}] {doc['text']} (Video: {doc['root_object_id']}, {doc['start_time']:.1f}s-{doc['end_time']:.1f}s)"for i, doc in enumerate(results["results"])])# Generate answer with citationsresponse = openai.chat.completions.create(model="gpt-4o",messages=[{"role": "system", "content": f"Answer using this video context:\n{context}"},{"role": "user", "content": "How do I configure the firewall settings?"}])print(response.choices[0].message.content)
Feature Extractors
Retriever Stages
rerank
Rerank documents using cross-encoder models for accurate relevance
summarize
Condense multiple documents into a summary using an LLM
