Semantic Multimodal Retrieval
The base layer for all retrieval workflows. Unified semantic search across modalities using vision and text embeddings with cross-modal fusion.
Why This Matters
This is the foundation every other recipe builds on—semantic understanding across any content type without keywords or manual tagging.
from mixpeek import Mixpeekclient = Mixpeek(api_key="your-api-key")# Create collection with multimodal extractorscollection = client.collections.create(collection_name="media_library",feature_extractor={"feature_extractor_name": "multimodal_extractor","version": "v1"})# Index objectsclient.buckets.objects.create(bucket_id="my-bucket",blobs=[{"property": "video","url": "s3://bucket/video.mp4"}])# Search semanticallyresults = client.retrievers.execute(retriever_id="semantic-retriever",inputs={"query_text": "product demo with testimonials"},limit=20)
Retrieval Flow
Vector search across multimodal embeddings
Filter by metadata constraints
Return top-k results
Feature Extractors
Feature Extractors
Image Embedding
Generate visual embeddings for similarity search and clustering
Video Embedding
Generate vector embeddings for video content
Text Embedding
Extract semantic embeddings from documents, transcripts and text content
Audio Transcription
Transcribe audio content to text
Retriever Stages
feature search
Search collections using multimodal embeddings
attribute filter
Filter documents by metadata attributes
limit
Limit the number of documents returned
