Skip to main content
Wire Mixpeek retrievers into OpenAI’s function calling API so GPT models can search video, image, and audio content on demand.
There is no langchain-mixpeek package. You use the standard mixpeek SDK and wrap it as an OpenAI function. This gives you full control over input parsing, error handling, and response formatting.

The Pattern

OpenAI function calling lets GPT models decide when to invoke external tools during a conversation. You define a function schema, register it as a tool, and handle the call in your completion loop.
1

Define the function schema

Describe search_mixpeek with a name, description, and parameters so the model knows when and how to call it.
2

Register it as a tool

Pass the schema in the tools array when calling chat.completions.create().
3

Handle tool calls in the completion loop

When the model returns tool_calls, execute client.retrievers.execute() with the provided arguments and append the results as a tool message.
4

Get the final response

Call chat.completions.create() again with the tool results. The model incorporates the search results into its answer.

Installation

pip install mixpeek openai

Function Schema

Define a function that tells GPT what Mixpeek search does and what inputs it accepts:
{
  "type": "function",
  "function": {
    "name": "search_mixpeek",
    "description": "Search across video, image, and audio content indexed in Mixpeek. Use this when the user asks about visual content, media files, or multimedia information.",
    "parameters": {
      "type": "object",
      "properties": {
        "query": {
          "type": "string",
          "description": "Natural language search query describing what to find"
        },
        "limit": {
          "type": "integer",
          "description": "Maximum number of results to return (default 5)"
        }
      },
      "required": ["query"]
    }
  }
}
Write a specific description that tells the model when to call this tool. Mention the content types your retriever handles (video, images, audio). Generic descriptions like “search for things” cause the model to over- or under-use the function.

Full Working Example: Chat Completions API

import json
from openai import OpenAI
from mixpeek import Mixpeek

openai_client = OpenAI(api_key="YOUR_OPENAI_KEY")
mixpeek_client = Mixpeek(api_key="YOUR_MIXPEEK_KEY")

# Define the tool
tools = [
    {
        "type": "function",
        "function": {
            "name": "search_mixpeek",
            "description": (
                "Search across video, image, and audio content. "
                "Use when the user asks about visual content, media files, "
                "or multimedia information."
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Natural language search query",
                    },
                    "limit": {
                        "type": "integer",
                        "description": "Max results to return (default 5)",
                    },
                },
                "required": ["query"],
            },
        },
    }
]


def execute_search(query: str, limit: int = 5) -> str:
    """Call the Mixpeek retriever and return results as JSON."""
    results = mixpeek_client.retrievers.execute(
        retriever_id="ret_abc123",
        inputs={"query": query},
        namespace="my-namespace",
    )
    # Trim to limit and keep only essential fields for token efficiency
    trimmed = [
        {
            "document_id": doc["document_id"],
            "score": doc["score"],
            "metadata": doc.get("metadata", {}),
        }
        for doc in results[:limit]
    ]
    return json.dumps(trimmed, indent=2)


def chat(user_message: str):
    messages = [
        {
            "role": "system",
            "content": "You help users find and analyze multimedia content.",
        },
        {"role": "user", "content": user_message},
    ]

    # First call -- model may request a tool call
    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools,
    )
    message = response.choices[0].message

    # Handle tool calls
    if message.tool_calls:
        messages.append(message)

        for tool_call in message.tool_calls:
            args = json.loads(tool_call.function.arguments)
            result = execute_search(
                query=args["query"],
                limit=args.get("limit", 5),
            )
            messages.append(
                {
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": result,
                }
            )

        # Second call -- model generates final answer with results
        response = openai_client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools,
        )
        message = response.choices[0].message

    return message.content


print(chat("Find clips where someone mentions the product launch"))

Assistants API

You can register Mixpeek search as a tool on an OpenAI Assistant. The Assistants API manages conversation state and persistent threads for you.
import json
import time
from openai import OpenAI
from mixpeek import Mixpeek

openai_client = OpenAI(api_key="YOUR_OPENAI_KEY")
mixpeek_client = Mixpeek(api_key="YOUR_MIXPEEK_KEY")

# Create an assistant with the Mixpeek tool
assistant = openai_client.beta.assistants.create(
    name="Media Search Assistant",
    instructions=(
        "You help users search and analyze video, image, and audio content. "
        "Use the search_mixpeek tool whenever the user asks about media files."
    ),
    model="gpt-4o",
    tools=[
        {
            "type": "function",
            "function": {
                "name": "search_mixpeek",
                "description": (
                    "Search across video, image, and audio content indexed in Mixpeek."
                ),
                "parameters": {
                    "type": "object",
                    "properties": {
                        "query": {
                            "type": "string",
                            "description": "Natural language search query",
                        },
                        "limit": {
                            "type": "integer",
                            "description": "Max results to return (default 5)",
                        },
                    },
                    "required": ["query"],
                },
            },
        }
    ],
)

# Create a thread and send a message
thread = openai_client.beta.threads.create()
openai_client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Find video frames showing the CEO on stage at the keynote",
)

# Start a run
run = openai_client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id,
)

# Poll until complete, handling tool calls
while True:
    run = openai_client.beta.threads.runs.retrieve(
        thread_id=thread.id, run_id=run.id
    )

    if run.status == "requires_action":
        tool_outputs = []
        for tool_call in run.required_action.submit_tool_outputs.tool_calls:
            args = json.loads(tool_call.function.arguments)
            results = mixpeek_client.retrievers.execute(
                retriever_id="ret_abc123",
                inputs={"query": args["query"]},
                namespace="my-namespace",
            )
            trimmed = [
                {
                    "document_id": doc["document_id"],
                    "score": doc["score"],
                    "metadata": doc.get("metadata", {}),
                }
                for doc in results[: args.get("limit", 5)]
            ]
            tool_outputs.append(
                {
                    "tool_call_id": tool_call.id,
                    "output": json.dumps(trimmed),
                }
            )

        run = openai_client.beta.threads.runs.submit_tool_outputs(
            thread_id=thread.id,
            run_id=run.id,
            tool_outputs=tool_outputs,
        )

    elif run.status == "completed":
        break
    elif run.status in ("failed", "cancelled", "expired"):
        print(f"Run ended with status: {run.status}")
        break
    else:
        time.sleep(1)

# Get the assistant's response
messages = openai_client.beta.threads.messages.list(thread_id=thread.id)
print(messages.data[0].content[0].text.value)

Tips

Write Descriptive Function Schemas

The description field on your function and its parameters directly affects when and how the model calls your tool. Be specific about what content types your retriever handles.
# Good -- tells the model exactly when to use it
"Search video frames, audio transcripts, and image content in the media library. Returns timestamped results with relevance scores."

# Bad -- too vague, model won't know when to call it
"Search for stuff in the database."

Limit Result Size for Token Efficiency

Every result you return gets added to the conversation context. Strip unnecessary fields and cap the number of results to avoid wasting tokens.
def execute_search(query: str, limit: int = 5) -> str:
    results = mixpeek_client.retrievers.execute(
        retriever_id="ret_abc123",
        inputs={"query": query},
        namespace="my-namespace",
    )
    # Return only what the model needs to formulate an answer
    trimmed = [
        {
            "id": doc["document_id"],
            "score": round(doc["score"], 3),
            "title": doc.get("metadata", {}).get("title", ""),
            "summary": doc.get("metadata", {}).get("description", "")[:200],
        }
        for doc in results[:limit]
    ]
    return json.dumps(trimmed)

Handle Errors Gracefully

Return error messages as tool output instead of raising exceptions. This lets the model recover or ask the user to rephrase.
def execute_search(query: str, limit: int = 5) -> str:
    try:
        results = mixpeek_client.retrievers.execute(
            retriever_id="ret_abc123",
            inputs={"query": query},
            namespace="my-namespace",
        )
        trimmed = [
            {"id": doc["document_id"], "score": doc["score"]}
            for doc in results[:limit]
        ]
        return json.dumps(trimmed)
    except Exception as e:
        return json.dumps({"error": str(e), "suggestion": "Try a different query."})
If you use parallel_tool_calls (enabled by default in the Chat Completions API), the model may issue multiple search_mixpeek calls in a single response. The handler loop in the examples above already supports this — each tool call is processed independently.

Next Steps

LangChain

Use Mixpeek as a LangChain tool for agent workflows

MCP Server

Connect Claude directly via the Model Context Protocol

Retriever Stages

Configure multi-stage search pipelines

Python SDK

Full SDK reference