Mixpeek is flexible vision understanding infrastructure that's built to scale with you. Use our APIs to index, search, classify, generate and analyze videos and images for your most ambitious applications.
Even bring your own database Read moreEmbed scene extracts key information from video frames, providing a rich understanding of the visual content.
{
"scene": {
"embedding": [0.1, 0.2, 0.3, 0.4],
"objects": ["car", "tree", "person"],
"actions": ["driving", "walking"],
"setting": "urban street",
"time_of_day": "daytime",
"weather": "sunny"
}
}
Face detection identifies and analyzes human faces in images or video frames.
{
"faces": [
{
"bounding_box": [100, 50, 200, 150],
"confidence": 0.98,
"landmarks": {
"left_eye": [120, 80],
"right_eye": [180, 80],
"nose": [150, 100],
"mouth_left": [130, 130],
"mouth_right": [170, 130]
},
"emotions": {
"happy": 0.7,
"neutral": 0.3
}
}
]
}
Audio transcription converts spoken words in audio files to written text.
{
"transcription": [
{
"start_time": "00:00:01",
"end_time": "00:00:05",
"speaker": "Speaker 1",
"text": "Welcome to our video on AI-powered video analysis."
},
{
"start_time": "00:00:06",
"end_time": "00:00:10",
"speaker": "Speaker 2",
"text": "Today, we'll explore how machine learning can extract insights from video content."
}
]
}
Text reading extracts and recognizes text present in images or video frames.
{
"text_regions": [
{
"bounding_box": [50, 100, 300, 150],
"text": "AI-Powered Video Analysis",
"confidence": 0.95
},
{
"bounding_box": [75, 200, 275, 250],
"text": "Extracting Insights",
"confidence": 0.92
}
]
}
Activity description provides a detailed analysis of actions and events occurring in the video.
{
"activities": [
{
"timestamp": "00:00:05",
"description": "A person is jogging in a park",
"confidence": 0.95,
"objects": ["person", "trees", "path"],
"actions": ["jogging", "moving"]
},
{
"timestamp": "00:00:15",
"description": "A dog is playing fetch with its owner",
"confidence": 0.92,
"objects": ["person", "dog", "ball"],
"actions": ["throwing", "running", "catching"]
}
]
}
You can choose to use each method individually or just index the entire video for end-to-end search. They can come from a live camera feed or object storage like AWS S3.
Leverage your newly-structured data to build apps powered by previously unaccessible data.
Use Case Docs
mixpeek.search("person jogging in park with dog")
{
"results": [
{
"start_time": 0,
"end_time": 5,
"embedding": [0.1, 0.2, 0.3, 0.4],
"faces": ["face.jpg"],
"transcription": {
"text": "It's a beautiful day for a jog in the park.",
"speaker": "Narrator"
},
"text": [
{
"text": "Park Entrance",
"bounding_box": [50, 100, 300, 150],
"confidence": 0.95
}
],
"descriptions": {
"description": "A person is jogging on a path in a sunny park",
"confidence": 0.92
}
}
]
}
Every change, no matter where or in what form gets sent to our processing pipeline in real-time.
Pull out the important bits and convert them into embeddings and metadata that can be used for AI.
Every model can be fine-tuned to your specific use-case and scaled to handle any amount of data.
Initialize once, and continue building more advanced AI apps on top of fresh data. Treat your S3 & database as one entity.
Get started on the free plan with an easy-to-use API or the Python client.
Scale from zero to billions of items, with no downtime and minimal latency impact.
Start free, then pay only for what you use with usage-based pricing.
We will never charge you if you maintain under the file quota.
Choose a cloud provider and region — we'll take care of uptime, consistency, and the rest.
mixpeek is SOC 2 Type II and GDPR-ready. It's built to keep data secure. See our security stance.