LangChain - Mixpeek

The langchain-mixpeek package gives LangChain agents the ability to see video, hear audio, search images, and act on unstructured content — all through Mixpeek’s multimodal infrastructure.

Installation

pip install langchain-mixpeek

Quick Start

1. Search (Retriever)

from langchain_mixpeek import MixpeekRetriever

retriever = MixpeekRetriever(
    api_key="mxp_...",
    retriever_id="ret_abc123",
    namespace="my-namespace",
)
docs = retriever.invoke("find the red cup")

Each result is a LangChain Document with page_content and metadata (document_id, score, namespace).

2. Agent Tool

from langchain_mixpeek import MixpeekRetriever

retriever = MixpeekRetriever(
    api_key="mxp_...",
    retriever_id="ret_abc123",
    namespace="my-namespace",
)

# One line — retriever becomes an agent tool
tool = retriever.as_tool()

3. Full Toolkit (search + ingest + classify + cluster + alert)

from langchain_mixpeek import MixpeekToolkit
from langgraph.prebuilt import create_react_agent
from langchain_anthropic import ChatAnthropic

toolkit = MixpeekToolkit(
    api_key="mxp_...",
    namespace="my-namespace",
    bucket_id="bkt_abc123",
    collection_id="col_def456",
    retriever_id="ret_ghi789",
)

agent = create_react_agent(
    ChatAnthropic(model="claude-sonnet-4-20250514"),
    toolkit.get_tools(),
)

result = agent.invoke({
    "messages": [("user", "Scan these product URLs and alert me about counterfeits")]
})

The toolkit gives your agent 6 capabilities:

Tool	What it does
`mixpeek_search`	Search video, images, audio, documents by natural language
`mixpeek_ingest`	Upload text, images, video, audio, PDFs, spreadsheets
`mixpeek_process`	Trigger feature extraction (embedding, OCR, transcription, face detection)
`mixpeek_classify`	Run taxonomy classification on documents
`mixpeek_cluster`	Group similar documents (kmeans, dbscan, hdbscan, etc.)
`mixpeek_alert`	Set up monitoring with webhook, Slack, or email notifications

4. VectorStore (full pipeline)

from langchain_mixpeek import MixpeekVectorStore

store = MixpeekVectorStore(
    api_key="mxp_...",
    namespace="my-namespace",
    bucket_id="bkt_abc123",
    collection_id="col_def456",
    retriever_id="ret_ghi789",
)

# Ingest any content type
store.add_texts(["product description..."])
store.add_images(["https://example.com/photo.jpg"])
store.add_videos(["https://example.com/clip.mp4"])
store.add_audio(["https://example.com/recording.mp3"])
store.add_pdfs(["https://example.com/doc.pdf"])
store.add_excel(["https://example.com/data.xlsx"])

# Trigger processing (embedding, OCR, face detection, etc.)
store.trigger_processing()

# Search
docs = store.similarity_search("red cup on the table")

# Convert to agent tools anytime
tool = store.as_tool()
toolkit = store.as_toolkit()
retriever = store.as_retriever()

5. Search-Only (minimal config)

If you only need search, skip the bucket/collection config:

store = MixpeekVectorStore.from_retriever(
    api_key="mxp_...",
    namespace="my-namespace",
    retriever_id="ret_abc123",
)
docs = store.similarity_search("red cup")

Configuration

Parameter	Type	Default	Description
`api_key`	str	required	Mixpeek API key (`mxp_...`)
`retriever_id`	str	required	Retriever ID for search (`ret_...`)
`namespace`	str	required	Namespace to operate in
`bucket_id`	str	required*	Bucket for uploads (`bkt_...`)
`collection_id`	str	required*	Collection for processing (`col_...`)
`top_k`	int	`10` / `5`	Max results (retriever / tool)
`content_field`	str	`"text"`	Field to use as `page_content`
`filters`	dict	`None`	Attribute filters (retriever only)

*Required for ingest/processing. Not needed for search-only via from_retriever().

The content_field can reference any field in your retriever results — including enrichment fields like trend_insight or brand_alignment. If the field contains a dict with a text key, the text is automatically extracted.

Examples

Brand Protection Agent

An agent that scans marketplace listings and alerts on counterfeits:

from langchain_mixpeek import MixpeekToolkit
from langgraph.prebuilt import create_react_agent
from langchain_anthropic import ChatAnthropic

toolkit = MixpeekToolkit(
    api_key="mxp_...",
    namespace="brand-protection",
    bucket_id="bkt_...",
    collection_id="col_...",
    retriever_id="ret_...",
)

# Only give the agent the tools it needs
agent = create_react_agent(
    ChatAnthropic(model="claude-sonnet-4-20250514"),
    toolkit.get_tools(actions=["search", "ingest", "process", "alert"]),
    prompt="You are a brand protection agent. Scan products and flag counterfeits.",
)

result = agent.invoke({
    "messages": [("user", "Check if these 5 Amazon listings are selling counterfeit Stanley cups")]
})

RAG Chain

Standard retrieval-augmented generation:

from langchain_core.prompts import ChatPromptTemplate
from langchain_anthropic import ChatAnthropic
from langchain_mixpeek import MixpeekRetriever

retriever = MixpeekRetriever(
    api_key="mxp_...",
    retriever_id="ret_...",
    namespace="my-namespace",
)
llm = ChatAnthropic(model="claude-sonnet-4-20250514")

prompt = ChatPromptTemplate.from_template(
    "Answer using this context:\n{context}\n\nQuestion: {question}"
)

chain = {"context": retriever, "question": lambda x: x} | prompt | llm
response = chain.invoke("what happens at 2 minutes?")

Multi-Retriever Agent

Different retrievers for different content types:

from langchain_mixpeek import MixpeekTool
from langgraph.prebuilt import create_react_agent

video_search = MixpeekTool(
    api_key="mxp_...",
    retriever_id="ret_video_archive",
    namespace="archive",
    name="search_video_archive",
    description="Search video archive for specific scenes, faces, or moments.",
)

image_search = MixpeekTool(
    api_key="mxp_...",
    retriever_id="ret_product_images",
    namespace="catalog",
    name="search_product_images",
    description="Search product image catalog by visual similarity.",
)

agent = create_react_agent(llm, [video_search, image_search])

Platform Features

The VectorStore exposes the full Mixpeek platform:

Taxonomies (document classification)

# Create a taxonomy
store.create_taxonomy(
    name="product-categories",
    config={
        "taxonomy_type": "flat",
        "retriever_id": "ret_...",
        "collection_id": "col_...",
        "input_mappings": [...],
        "enrichment_fields": [...],
    },
)

# List and execute
taxonomies = store.list_taxonomies()
results = store.execute_taxonomy("tax_abc123")

Clusters (unsupervised grouping)

# Create and run clustering
cluster = store.create_cluster(
    cluster_type="vector",
    vector_config={
        "algorithm": "kmeans",  # or dbscan, hdbscan, spectral, etc.
        "algorithm_params": {"n_clusters": 10},
    },
)
store.execute_cluster(cluster["cluster_id"])
groups = store.get_cluster_groups(cluster["cluster_id"])

Alerts (match notifications)

# Create an alert with webhook + Slack
store.create_alert(
    name="counterfeit-detection",
    notification_config={
        "channels": [
            {"channel_type": "webhook", "config": {"url": "https://..."}},
            {"channel_type": "slack", "channel_id": "#alerts"},
        ],
        "include_matches": True,
        "include_scores": True,
    },
)

# Check results
results = store.get_alert_results("alt_abc123")

Custom Plugins

# List deployed plugins
plugins = store.list_plugins()

# Check deployment status
status = store.get_plugin_status("plg_abc123")

# Test a realtime plugin
result = store.test_plugin("plg_abc123", inputs={"text": "hello"})

Tips

Selecting Toolkit Actions

Don’t give agents tools they don’t need. Use actions to scope:

# Search-only agent
toolkit.get_tools(actions=["search"])

# Ingest + search agent
toolkit.get_tools(actions=["search", "ingest", "process"])

# Full platform agent
toolkit.get_tools()  # all 6 tools

Error Handling

All toolkit tools catch exceptions and return error strings instead of crashing the agent. The retriever raises exceptions normally.

Token Efficiency

Set top_k to limit results. Large result sets waste tokens without improving quality. Start with top_k=5.

Source Code

PyPI: langchain-mixpeek (Python)
npm: @mixpeek/langchain (JavaScript)
GitHub: mixpeek/langchain-mixpeek
LangChain Docs: Tools · Vector Store
Connector Page: mixpeek.com/connectors/langchain

Next Steps

MCP Server

Connect Claude directly via the Model Context Protocol

OpenAI Function Calling

Wire Mixpeek into OpenAI assistants

Feature Extractors

15+ extractors: text, image, video, audio, face, PDF, web scraper

Python SDK

Full SDK reference

​Installation

​Quick Start

​1. Search (Retriever)

​2. Agent Tool

​3. Full Toolkit (search + ingest + classify + cluster + alert)

​4. VectorStore (full pipeline)

​5. Search-Only (minimal config)

​Configuration

​Examples

​Brand Protection Agent

​RAG Chain

​Multi-Retriever Agent

​Platform Features

​Taxonomies (document classification)

​Clusters (unsupervised grouping)

​Alerts (match notifications)

​Custom Plugins

​Tips

​Selecting Toolkit Actions

​Error Handling

​Token Efficiency

​Source Code

​Next Steps

MCP Server

OpenAI Function Calling

Feature Extractors

Python SDK

Installation

Quick Start

1. Search (Retriever)

2. Agent Tool

3. Full Toolkit (search + ingest + classify + cluster + alert)

4. VectorStore (full pipeline)

5. Search-Only (minimal config)

Configuration

Examples

Brand Protection Agent

RAG Chain

Multi-Retriever Agent

Platform Features

Taxonomies (document classification)

Clusters (unsupervised grouping)

Alerts (match notifications)

Custom Plugins

Tips

Selecting Toolkit Actions

Error Handling

Token Efficiency

Source Code

Next Steps