Mixpeek Logo
    Schedule Demo
    ESEthan Steininger
    2 min read

    Build a Model Context Protocol (MCP) Server on S3 using Lambda, Temporal, Ray, and Qdrant

    Build a scalable MCP pipeline on S3 using AWS Lambda, Temporal, Ray, and Qdrant to process and index unstructured data like video, audio, and PDFs for real-time AI search and retrieval.

    Build a Model Context Protocol (MCP) Server on S3 using Lambda, Temporal, Ray, and Qdrant
    Implementation

    Learn how to build an MCP (Multimodal Context Protocol) pipeline directly on top of S3 using Lambda (for change detection), Temporal (for orchestration), Ray (for scalable compute), and Qdrant (for vector search).

    🧩 Architecture Overview

    To build a fully event-driven MCP pipeline that indexes unstructured content from S3 (videos, PDFs, images, logs), here’s what you’ll need:

    • Amazon S3 — your raw unstructured data source
    • AWS Lambda — triggers feature extraction on new data (CDC style)
    • Temporal — orchestration and retries across modalities
    • Ray — distributed execution for compute-heavy tasks (e.g., video segmentation, embedding)
    • Qdrant — vector DB to store indexed representations

    📥 Example: From Upload to Retrieval

    Once this pipeline is in place, the flow becomes simple:

    • Upload a file (e.g. customer-incident.mp4) to S3
    • Trigger runs via Lambda → Temporal → Ray to extract embeddings
    • Store outputs in Qdrant (vectors) and Postgres (metadata)

    Query Qdrant with a prompt like:

    “Show me all videos where someone slips and falls indoors”

    The system surfaces relevant clips — no manual tagging required.


    🔗 System Flow

    1. Change Detection (CDC): S3 → EventBridge → Lambda
    2. Orchestration: Lambda kicks off a Temporal workflow
    3. Distributed Processing: Temporal starts Ray tasks
    4. Indexing: Outputs pushed to Qdrant (for vector search) and optionally Postgres (for metadata)
    graph TD A[S3 - New File] --> B[EventBridge Trigger] B --> C[AWS Lambda - CDC] C --> D[Temporal Workflow - Orchestration] D --> E[Ray Tasks - Feature Extraction] E --> F[Qdrant - Vector Index] E --> G[Postgres - Metadata Store]

    ⚙️ Infrastructure Breakdown

    Step 1: S3 CDC via Lambda

    def lambda_handler(event, context):
        s3_key = event["Records"][0]["s3"]["object"]["key"]
        # Start Temporal workflow
        temporal_client.start_workflow("ProcessMultimodalFile", args={"key": s3_key})
    

    Step 2: Temporal Workflow

    @workflow.defn
    class ProcessMultimodalFile:
        @workflow.run
        async def run(self, key):
            video_path = await download_from_s3(key)
            await workflow.execute_activity(extract_and_embed, video_path)
    

    Step 3: Ray Tasks for Feature Extraction

    @ray.remote
    def extract_and_embed(path):
        vision = vision_embedder(path)
        audio = audio_embedder(path)
        qdrant_client.insert([vision, audio])
    

    🧠 What’s Indexed?

    Modality Examples Extracted
    Vision Object detection, OCR, visual style
    Audio Speech-to-text, speaker ID
    Text Entities, sentiment, topic modeling
    PDF/Image Layouts, diagrams, handwriting

    🏗️ Mixpeek’s Fully Managed MCP Stack

    Architecture - Mixpeek
    Understanding Mixpeek core architecture and data flow
    Mixpeek Architecture Diagram

    🔥 Long-Tail Use Cases

    Industry Use Case Example
    Insurance Detect slip-and-fall claims via video ingestion from branch cameras
    Healthcare Search for similar MRI results across a decade of unstructured imaging files
    Education Auto-index and summarize lecture videos for semantic search
    Security Tag suspicious behavior patterns across thousands of archived CCTV feeds
    Media Find moments of laughter or applause in podcast audio archives
    Logistics Scan forklift usage across warehouse footage to predict operator burnout

    🧩 Skip the Plumbing

    You could glue this all together — or you can use Mixpeek, which handles:

    • Multimodal ingestion pipelines
    • Zero-config embedding + indexing
    • Query-ready APIs for search, alerts, and retrieval
    • Event-based workflows with zero devops
    Built for developers who don’t want to reinvent multimodal infra.
    ES
    Ethan Steininger

    April 11, 2025 · 2 min read