Multimodal Enrichment Pipeline
Multi-tier collections that extract vision, audio, text, and metadata signals. This is the backbone—most enterprise pipelines start here.
Why This Matters
Enrichment pipelines are infrastructure. Raw → embeddings → searchable documents. Define once, query forever.
from mixpeek import Mixpeekclient = Mixpeek(api_key="your-api-key")# Create multi-tier enrichment collectioncollection = client.collections.create(collection_name="enriched_media",feature_extractor={"feature_extractor_name": "multimodal_extractor","version": "v1","parameters": {"enable_transcription": True,"enable_object_detection": True}})# Index objects (triggers extraction pipeline)client.buckets.objects.create(bucket_id="raw-media",blobs=[{"property": "video","url": "s3://bucket/meeting-recording.mp4"}])# Search enriched contentresults = client.retrievers.execute(retriever_id="enriched-search",inputs={"query_text": "quarterly roadmap discussion","filters": {"metadata.speaker": "CEO"}})
Retrieval Flow
Search enriched features
Filter by extracted metadata
Feature Extractors
Image Embedding
Generate visual embeddings for similarity search and clustering
Video Embedding
Generate vector embeddings for video content
Audio Transcription
Transcribe audio content to text
Text Embedding
Extract semantic embeddings from documents, transcripts and text content
Object Detection
Identify and locate objects within images with bounding boxes
Retriever Stages
feature search
Search collections using multimodal embeddings
attribute filter
Filter documents by metadata attributes
