Mixpeek Logo
    embedding

    Mixed
    Embeddings
    Converter

    Generate unified vector embeddings from mixed-modality inputs -- text, images, audio, and video combined. Enables cross-modal search where any modality can query any other modality in a single vector space.

    Max file size: 5 GB
    Estimated: 1-15 sec depending on modality
    8 input formats

    How It Works

    1

    Provide one or more inputs of any modality.

    2

    Each input is processed through its modality-specific encoder.

    3

    Modality embeddings are projected into a shared vector space.

    4

    A fused embedding is produced that represents the combined input.

    5

    The unified embedding enables cross-modal similarity search.

    Code Examples

    from mixpeek import Mixpeek
    client = Mixpeek(api_key="YOUR_API_KEY")
    result = client.convert(
    sources=[
    {"type": "text", "content": "A red sports car on a mountain road"},
    {"type": "image", "url": "https://example.com/car.jpg"}
    ],
    from_format="multimodal",
    to_format="embeddings",
    options={
    "model": "clip-vit-l-14",
    "fusion_strategy": "weighted_average",
    "weights": {"text": 0.4, "image": 0.6}
    }
    )
    print(f"Fused embedding dim: {len(result.embedding)}")

    Use Cases

    Search videos using text queries and vice versa
    Build unified search across documents, images, and audio
    Create recommendation systems that span content types
    Enable 'find similar' features across an entire media library

    Supported Input Formats

    JPEG
    PNG
    MP4
    MP3
    WAV
    TXT
    PDF
    JSON

    Quick Info

    Categoryembedding
    Max File Size5 GB
    Est. Time1-15 sec depending on modality

    Try This Conversion

    Get started with the Mixpeek API and convert your first file in minutes.

    Frequently Asked Questions

    Ready to convert mixed to embeddings?

    Start using the Mixpeek Multimodal to Embeddings in minutes. Sign up for a free API key and follow the documentation to get started.