The langchain-mixpeek package gives LangChain agents the ability to see video, hear audio, search images, and act on unstructured content — all through Mixpeek’s multimodal infrastructure.
Installation
pip install langchain-mixpeek
Quick Start
1. Search (Retriever)
from langchain_mixpeek import MixpeekRetriever
retriever = MixpeekRetriever(
api_key = "mxp_..." ,
retriever_id = "ret_abc123" ,
namespace = "my-namespace" ,
)
docs = retriever.invoke( "find the red cup" )
Each result is a LangChain Document with page_content and metadata (document_id, score, namespace).
from langchain_mixpeek import MixpeekRetriever
retriever = MixpeekRetriever(
api_key = "mxp_..." ,
retriever_id = "ret_abc123" ,
namespace = "my-namespace" ,
)
# One line — retriever becomes an agent tool
tool = retriever.as_tool()
from langchain_mixpeek import MixpeekToolkit
from langgraph.prebuilt import create_react_agent
from langchain_anthropic import ChatAnthropic
toolkit = MixpeekToolkit(
api_key = "mxp_..." ,
namespace = "my-namespace" ,
bucket_id = "bkt_abc123" ,
collection_id = "col_def456" ,
retriever_id = "ret_ghi789" ,
)
agent = create_react_agent(
ChatAnthropic( model = "claude-sonnet-4-20250514" ),
toolkit.get_tools(),
)
result = agent.invoke({
"messages" : [( "user" , "Scan these product URLs and alert me about counterfeits" )]
})
The toolkit gives your agent 6 capabilities:
Tool What it does mixpeek_searchSearch video, images, audio, documents by natural language mixpeek_ingestUpload text, images, video, audio, PDFs, spreadsheets mixpeek_processTrigger feature extraction (embedding, OCR, transcription, face detection) mixpeek_classifyRun taxonomy classification on documents mixpeek_clusterGroup similar documents (kmeans, dbscan, hdbscan, etc.) mixpeek_alertSet up monitoring with webhook, Slack, or email notifications
4. VectorStore (full pipeline)
from langchain_mixpeek import MixpeekVectorStore
store = MixpeekVectorStore(
api_key = "mxp_..." ,
namespace = "my-namespace" ,
bucket_id = "bkt_abc123" ,
collection_id = "col_def456" ,
retriever_id = "ret_ghi789" ,
)
# Ingest any content type
store.add_texts([ "product description..." ])
store.add_images([ "https://example.com/photo.jpg" ])
store.add_videos([ "https://example.com/clip.mp4" ])
store.add_audio([ "https://example.com/recording.mp3" ])
store.add_pdfs([ "https://example.com/doc.pdf" ])
store.add_excel([ "https://example.com/data.xlsx" ])
# Trigger processing (embedding, OCR, face detection, etc.)
store.trigger_processing()
# Search
docs = store.similarity_search( "red cup on the table" )
# Convert to agent tools anytime
tool = store.as_tool()
toolkit = store.as_toolkit()
retriever = store.as_retriever()
5. Search-Only (minimal config)
If you only need search, skip the bucket/collection config:
store = MixpeekVectorStore.from_retriever(
api_key = "mxp_..." ,
namespace = "my-namespace" ,
retriever_id = "ret_abc123" ,
)
docs = store.similarity_search( "red cup" )
Configuration
Parameter Type Default Description api_keystr required Mixpeek API key (mxp_...) retriever_idstr required Retriever ID for search (ret_...) namespacestr required Namespace to operate in bucket_idstr required* Bucket for uploads (bkt_...) collection_idstr required* Collection for processing (col_...) top_kint 10 / 5Max results (retriever / tool) content_fieldstr "text"Field to use as page_content filtersdict NoneAttribute filters (retriever only)
*Required for ingest/processing. Not needed for search-only via from_retriever().
The content_field can reference any field in your retriever results — including enrichment fields like trend_insight or brand_alignment. If the field contains a dict with a text key, the text is automatically extracted.
Examples
Brand Protection Agent
An agent that scans marketplace listings and alerts on counterfeits:
from langchain_mixpeek import MixpeekToolkit
from langgraph.prebuilt import create_react_agent
from langchain_anthropic import ChatAnthropic
toolkit = MixpeekToolkit(
api_key = "mxp_..." ,
namespace = "brand-protection" ,
bucket_id = "bkt_..." ,
collection_id = "col_..." ,
retriever_id = "ret_..." ,
)
# Only give the agent the tools it needs
agent = create_react_agent(
ChatAnthropic( model = "claude-sonnet-4-20250514" ),
toolkit.get_tools( actions = [ "search" , "ingest" , "process" , "alert" ]),
prompt = "You are a brand protection agent. Scan products and flag counterfeits." ,
)
result = agent.invoke({
"messages" : [( "user" , "Check if these 5 Amazon listings are selling counterfeit Stanley cups" )]
})
RAG Chain
Standard retrieval-augmented generation:
from langchain_core.prompts import ChatPromptTemplate
from langchain_anthropic import ChatAnthropic
from langchain_mixpeek import MixpeekRetriever
retriever = MixpeekRetriever(
api_key = "mxp_..." ,
retriever_id = "ret_..." ,
namespace = "my-namespace" ,
)
llm = ChatAnthropic( model = "claude-sonnet-4-20250514" )
prompt = ChatPromptTemplate.from_template(
"Answer using this context: \n {context} \n\n Question: {question} "
)
chain = { "context" : retriever, "question" : lambda x : x} | prompt | llm
response = chain.invoke( "what happens at 2 minutes?" )
Multi-Retriever Agent
Different retrievers for different content types:
from langchain_mixpeek import MixpeekTool
from langgraph.prebuilt import create_react_agent
video_search = MixpeekTool(
api_key = "mxp_..." ,
retriever_id = "ret_video_archive" ,
namespace = "archive" ,
name = "search_video_archive" ,
description = "Search video archive for specific scenes, faces, or moments." ,
)
image_search = MixpeekTool(
api_key = "mxp_..." ,
retriever_id = "ret_product_images" ,
namespace = "catalog" ,
name = "search_product_images" ,
description = "Search product image catalog by visual similarity." ,
)
agent = create_react_agent(llm, [video_search, image_search])
The VectorStore exposes the full Mixpeek platform:
Taxonomies (document classification)
# Create a taxonomy
store.create_taxonomy(
name = "product-categories" ,
config = {
"taxonomy_type" : "flat" ,
"retriever_id" : "ret_..." ,
"collection_id" : "col_..." ,
"input_mappings" : [ ... ],
"enrichment_fields" : [ ... ],
},
)
# List and execute
taxonomies = store.list_taxonomies()
results = store.execute_taxonomy( "tax_abc123" )
Clusters (unsupervised grouping)
# Create and run clustering
cluster = store.create_cluster(
cluster_type = "vector" ,
vector_config = {
"algorithm" : "kmeans" , # or dbscan, hdbscan, spectral, etc.
"algorithm_params" : { "n_clusters" : 10 },
},
)
store.execute_cluster(cluster[ "cluster_id" ])
groups = store.get_cluster_groups(cluster[ "cluster_id" ])
Alerts (match notifications)
# Create an alert with webhook + Slack
store.create_alert(
name = "counterfeit-detection" ,
notification_config = {
"channels" : [
{ "channel_type" : "webhook" , "config" : { "url" : "https://..." }},
{ "channel_type" : "slack" , "channel_id" : "#alerts" },
],
"include_matches" : True ,
"include_scores" : True ,
},
)
# Check results
results = store.get_alert_results( "alt_abc123" )
Custom Plugins
# List deployed plugins
plugins = store.list_plugins()
# Check deployment status
status = store.get_plugin_status( "plg_abc123" )
# Test a realtime plugin
result = store.test_plugin( "plg_abc123" , inputs = { "text" : "hello" })
Tips
Don’t give agents tools they don’t need. Use actions to scope:
# Search-only agent
toolkit.get_tools( actions = [ "search" ])
# Ingest + search agent
toolkit.get_tools( actions = [ "search" , "ingest" , "process" ])
# Full platform agent
toolkit.get_tools() # all 6 tools
Error Handling
All toolkit tools catch exceptions and return error strings instead of crashing the agent. The retriever raises exceptions normally.
Token Efficiency
Set top_k to limit results. Large result sets waste tokens without improving quality. Start with top_k=5.
Source Code
Next Steps
MCP Server Connect Claude directly via the Model Context Protocol
OpenAI Function Calling Wire Mixpeek into OpenAI assistants
Feature Extractors 15+ extractors: text, image, video, audio, face, PDF, web scraper
Python SDK Full SDK reference