Mixpeek Logo

    Objects

    Multimodal data units with blob storage, schema validation, and lineage tracking for downstream processing

    Why do anything?

    Raw files (videos, images, documents) need metadata and validation before ML processing. Without objects, data is unstructured and untraceable.

    Why now?

    AI applications ingest diverse formats. Manual file handling doesn't scale or maintain lineage.

    Why this feature?

    Objects combine blob storage with metadata, schema validation, and complete lineage tracking from source to processed documents.

    How It Works

    Objects are the fundamental data unit in Mixpeek. They contain blobs (actual content) plus metadata, with full lineage tracking.

    1

    Upload

    Blob content uploaded via API or SDK

    2

    Validation

    Content validated against parent bucket schema

    3

    Storage

    Blob stored in S3/MinIO/LocalStack, metadata in MongoDB

    4

    Lineage

    object_id assigned, root tracking established

    Why This Approach

    Separating blobs from metadata enables efficient storage while maintaining queryability. Lineage tracking ensures provenance through entire processing pipeline.

    Integration

    client.buckets.objects.create(bucket_id=bucket_id, blobs=[{"property": "content", "type": "text", "data": "..."}])