Objects
Multimodal data units with blob storage, schema validation, and lineage tracking for downstream processing
Why do anything?
Raw files (videos, images, documents) need metadata and validation before ML processing. Without objects, data is unstructured and untraceable.
Why now?
AI applications ingest diverse formats. Manual file handling doesn't scale or maintain lineage.
Why this feature?
Objects combine blob storage with metadata, schema validation, and complete lineage tracking from source to processed documents.
How It Works
Objects are the fundamental data unit in Mixpeek. They contain blobs (actual content) plus metadata, with full lineage tracking.
Upload
Blob content uploaded via API or SDK
Validation
Content validated against parent bucket schema
Storage
Blob stored in S3/MinIO/LocalStack, metadata in MongoDB
Lineage
object_id assigned, root tracking established
Why This Approach
Separating blobs from metadata enables efficient storage while maintaining queryability. Lineage tracking ensures provenance through entire processing pipeline.
Where This Is Used
Integration
client.buckets.objects.create(bucket_id=bucket_id, blobs=[{"property": "content", "type": "text", "data": "..."}])
