# Mixpeek Documentation > Multimodal AI infrastructure that makes unstructured data searchable and AI-ready. ## Documentation Overview Mixpeek is a multimodal data warehouse and retrieval platform. This documentation covers the complete platform for processing images, videos, audio, PDFs, and text into searchable, AI-ready assets. ## Quick Links - Full Documentation: https://mixpeek.com/docs - API Reference: https://mixpeek.com/docs/api-reference - OpenAPI Spec: https://api.mixpeek.com/docs/openapi.json - Quickstart: https://mixpeek.com/docs/overview/quickstart ## Core Concepts ### Terminology - **Namespace** = Qdrant collection (tenant isolation boundary) - **Bucket** = Raw file storage (S3/GCS/Azure/R2/Wasabi/Tigris) - **Collection** = Processing pipeline with configured feature extractors - **Document** = Qdrant point + payload; `document_id` for app logic - **Retriever** = Multi-stage configurable search pipeline - **Taxonomy** = Hierarchical classification system - **Cluster** = Automatic semantic grouping of content - **Plugin** = Custom Python-based feature extractor - **Manifest** = Infrastructure-as-code resource definition ### Data Ingestion Flow Never insert directly into MVS. Always: Bucket upload → Trigger collection → Wait for batch. ### Storage Tiering Collections have lifecycle states: `active` (MVS hot + warm), `cold` (MVS warm only), `archived` (metadata only). ## Essential Documentation Pages ### Getting Started - /overview/introduction - Platform overview - /overview/quickstart - 5-minute getting started guide - /overview/concepts - Core terminology - /overview/architecture - System architecture - /overview/data-model - Data model and relationships ### Data Ingestion - /ingestion/objects - Working with raw objects - /ingestion/uploads - File upload methods - /ingestion/buckets - Bucket management and syncs - /ingestion/collections - Collection configuration - /ingestion/namespaces - Namespace management - /ingestion/features - Feature overview ### Feature Extraction - /processing/feature-extractors - Available extractors overview - /processing/extractors/multimodal - Multimodal dense/sparse embeddings - /processing/extractors/image - Image analysis (detection, OCR, segmentation) - /processing/extractors/text - Text processing and embeddings - /processing/extractors/document - PDF/document extraction - /processing/extractors/face-identity - Face recognition and identity - /processing/extractors/web-scraper - Web content extraction - /processing/extractors/course-content - Educational content parsing - /processing/extractors/passthrough - No-op extractor - /processing/plugins - Custom Python plugin system - /processing/model-registry - Custom model management - /processing/batching - Batch processing and retry logic - /processing/pipelines - Pipeline configuration ### Search & Retrieval - /retrieval/retrievers - Retriever creation and management - /retrieval/stages/overview - Pipeline stages overview - /retrieval/filters - Query filtering syntax #### Filter Stages - /retrieval/stages/feature-search - Vector/semantic search - /retrieval/stages/attribute-filter - Metadata filtering - /retrieval/stages/llm-filter - LLM-based semantic filtering - /retrieval/stages/agent-search - Autonomous agent search - /retrieval/stages/query-expand - Query expansion #### Sort Stages - /retrieval/stages/sort-relevance - Score-based ordering - /retrieval/stages/sort-attribute - Field-based ordering - /retrieval/stages/mmr - Maximal Marginal Relevance - /retrieval/stages/rerank - Cross-encoder reranking - /retrieval/stages/score-normalize - Score normalization #### Reduce Stages - /retrieval/stages/aggregate - Group and reduce - /retrieval/stages/sample - Random sampling - /retrieval/stages/summarize - LLM summarization - /retrieval/stages/limit - Top-K cutoff - /retrieval/stages/deduplicate - Near-duplicate removal #### Group Stages - /retrieval/stages/group-by - Bucket by field value - /retrieval/stages/cluster - Semantic clustering of results #### Apply Stages - /retrieval/stages/json-transform - Reshape/project fields - /retrieval/stages/rag-prepare - Format for RAG injection - /retrieval/stages/external-web-search - Augment with live web results - /retrieval/stages/api-call - Call external HTTP endpoint - /retrieval/stages/sql-lookup - Join with SQL data source - /retrieval/stages/cross-compare - LLM-powered comparison - /retrieval/stages/web-scrape - Fetch content from URLs in results - /retrieval/stages/unwind - Flatten array fields - /retrieval/stages/code-execution - Sandboxed Python on results #### Enrich Stages - /retrieval/stages/llm-enrich - Add LLM-generated fields - /retrieval/stages/taxonomy-enrich - Apply taxonomy classification - /retrieval/stages/document-enrich - Join with related documents - /retrieval/stages/agentic-enrich - Autonomous enrichment ### Retriever Features - /retrieval/interactions - Click/view/conversion tracking - /retrieval/benchmarks - Head-to-head configuration comparison ### Relevance & Personalization - /relevance/overview - Relevance system overview - /relevance/interactions - Interaction signal collection - /relevance/fusion-strategies - Weighted, RRF, learned fusion - /relevance/learned-fusion - ML-trained fusion weights - /relevance/evaluations - Offline evaluation datasets - /relevance/analytics - Retrieval quality analytics ### Enrichment & Organization - /enrichment/taxonomies - Taxonomy-based classification - /enrichment/clusters - Automatic semantic clustering - /enrichment/retriever-enrichments - Retriever-based enrichment ### Operations - /operations/security - Auth, RBAC, secrets management - /operations/webhooks - Async event delivery - /operations/manifests - Infrastructure-as-code - /operations/storage-tiering - Storage lifecycle management ### Best Practices - /best-practices/schema-design - Collection and document schema - /best-practices/feature-selection - Choosing the right extractors - /best-practices/caching-strategies - Query and model caching - /best-practices/cost-optimization - Reducing compute and storage costs ### Troubleshooting - /troubleshoot/errors - Error reference - /troubleshoot/limits - Rate limits and quotas - /troubleshoot/common-issues - Common problems and fixes - /troubleshoot/faq - Frequently asked questions ### Integrations - /integrations/search-widget - Embeddable search UI - /integrations/object-storage/s3 - AWS S3 - /integrations/object-storage/gcs - Google Cloud Storage - /integrations/object-storage/azure-blob - Azure Blob Storage - /integrations/object-storage/r2 - Cloudflare R2 - /integrations/object-storage/wasabi - Wasabi - /integrations/object-storage/tigris - Tigris - /integrations/social-media/instagram - Instagram connector - /integrations/developer-tools/python-sdk - Python SDK - /integrations/developer-tools/javascript-sdk - JavaScript SDK - /integrations/developer-tools/mixpeek-cli - CLI - /integrations/developer-tools/mcp-server - MCP server for agents ## API Authentication All API requests require Bearer token authentication: ``` Authorization: Bearer YOUR_API_KEY ``` Namespace context is provided via header: ``` X-Namespace: ns_your_namespace_id ``` ## Common API Patterns ### Create a Namespace POST /v1/namespaces ### Create a Bucket and Collection POST /v1/buckets POST /v1/collections (with feature_extractors array) ### Upload and Process Data 1. POST /v1/buckets/{bucket_id}/uploads — upload file 2. Collection triggers batch processing automatically 3. GET /v1/buckets/{bucket_id}/batches/{batch_id} — poll status ### Execute Search POST /v1/retrievers/{retriever_id}/execute - Provide query in `inputs` - Retriever runs configured stages pipeline ### Deploy a Custom Plugin POST /v1/namespaces/{ns_id}/plugins — upload plugin code POST /v1/namespaces/{ns_id}/plugins/{plugin_id}/deploy — deploy ### Apply Manifest (IaC) POST /v1/manifest/apply — declaratively create/update resources POST /v1/manifest/validate — validate without applying GET /v1/manifest/export — export current state as manifest ## SDKs - Python: `pip install mixpeek` - JavaScript: `npm install mixpeek-sdk` - CLI: `pip install mixpeek-cli` - MCP Server: expose retrievers as tools for Claude and agents ## Support - Documentation: https://mixpeek.com/docs - GitHub: https://github.com/mixpeek - Discord: https://discord.gg/mixpeek - Email: support@mixpeek.com