Mixpeek Logo
    Semantic

    Semantic Multimodal Retrieval

    The base layer for all retrieval workflows. Unified semantic search across modalities using vision and text embeddings with cross-modal fusion.

    video
    image
    audio
    text
    Multi-Tier
    125.0K runs
    Deploy Recipe

    Why This Matters

    This is the foundation every other recipe builds on—semantic understanding across any content type without keywords or manual tagging.

    from mixpeek import Mixpeek
    client = Mixpeek(api_key="your-api-key")
    # Create collection with multimodal extractors
    collection = client.collections.create(
    collection_name="media_library",
    feature_extractor={
    "feature_extractor_name": "multimodal_extractor",
    "version": "v1"
    }
    )
    # Index objects
    client.buckets.objects.create(
    bucket_id="my-bucket",
    blobs=[{
    "property": "video",
    "url": "s3://bucket/video.mp4"
    }]
    )
    # Search semantically
    results = client.retrievers.execute(
    retriever_id="semantic-retriever",
    inputs={"query_text": "product demo with testimonials"},
    limit=20
    )

    Retrieval Flow

    1

    Vector search across multimodal embeddings

    2

    Filter by metadata constraints

    3
    limit(reduce)

    Return top-k results

    Feature Extractors

    Image Embedding

    Generate visual embeddings for similarity search and clustering

    752K runs

    Video Embedding

    Generate vector embeddings for video content

    610K runs

    Text Embedding

    Extract semantic embeddings from documents, transcripts and text content

    827K runs

    Audio Transcription

    Transcribe audio content to text

    450K runs

    Retriever Stages

    feature search

    Search collections using multimodal embeddings

    search

    attribute filter

    Filter documents by metadata attributes

    filter

    limit

    Limit the number of documents returned

    reduce