Objects

Multimodal data units with blob storage, schema validation, and lineage tracking for downstream processing

Why do anything?

Raw files (videos, images, documents) need metadata and validation before ML processing. Without objects, data is unstructured and untraceable.

Why now?

AI applications ingest diverse formats. Manual file handling doesn't scale or maintain lineage.

Why this feature?

Objects combine blob storage with metadata, schema validation, and complete lineage tracking from source to processed documents.

How It Works

Objects are the fundamental data unit in Mixpeek. They contain blobs (actual content) plus metadata, with full lineage tracking.

Upload

Blob content uploaded via API or SDK

Validation

Content validated against parent bucket schema

Storage

Blob stored in S3/MinIO/LocalStack, metadata in MongoDB

Lineage

object_id assigned, root tracking established

Why This Approach

Separating blobs from metadata enables efficient storage while maintaining queryability. Lineage tracking ensures provenance through entire processing pipeline.

Where This Is Used

E-commerce

Product Data Upload

Media

Video Content Ingestion

Integration

client.buckets.objects.create(bucket_id=bucket_id, blobs=[{"property": "content", "type": "text", "data": "..."}])

View Documentation

Related Capabilities

prerequisite

Buckets

Objects belong to buckets

often combined

Batches

Batches process objects into documents