Mixpeek Logo
    Enhanced

    Multimodal RAG Pipeline

    Build a retrieval-augmented generation system that works with text, images, and video. Feed relevant multimodal context to LLMs for grounded responses.

    text
    image
    video
    Multi-Tier
    4.9K runs
    Deploy Recipe
    from mixpeek import Mixpeek
    import openai
    client = Mixpeek(api_key="YOUR_API_KEY")
    # 1. Build knowledge base
    namespace = client.namespaces.create(name="rag-kb")
    collection = client.collections.create(
    namespace_id=namespace.id,
    name="docs-and-media",
    extractors=["text-embedding-v2", "image-embedding-v2"],
    chunk_strategy="semantic"
    )
    # 2. Ingest your content
    client.buckets.upload(
    collection_id=collection.id,
    url="s3://your-bucket/knowledge-base/"
    )
    # 3. Retrieve + Generate
    def rag_query(question: str):
    # Retrieve relevant context
    results = client.retrievers.execute(
    retriever_id=retriever.id,
    query=question,
    settings={"limit": 5}
    )
    # Build context from results
    context = "\n".join([r.content for r in results])
    # Generate answer with LLM
    response = openai.chat.completions.create(
    model="gpt-4",
    messages=[
    {"role": "system", "content": f"Answer based on this context:\n{context}"},
    {"role": "user", "content": question}
    ]
    )
    return response.choices[0].message.content

    Feature Extractors

    Retriever Stages