Give your apps the gift of sight.

Mixpeek is flexible vision understanding infrastructure that's built to scale with you. Use our APIs to index, search, classify, generate and analyze videos and images for your most ambitious applications.

Even bring your own database Read more

There's a method for everything

Embed scene extracts key information from video frames, providing a rich understanding of the visual content.


{
  "scene": {
    "embedding": [0.1, 0.2, 0.3, 0.4],
    "objects": ["car", "tree", "person"],
    "actions": ["driving", "walking"],
    "setting": "urban street",
    "time_of_day": "daytime",
    "weather": "sunny"
  }
}
                      

Face detection identifies and analyzes human faces in images or video frames.


{
  "faces": [
    {
      "bounding_box": [100, 50, 200, 150],
      "confidence": 0.98,
      "landmarks": {
        "left_eye": [120, 80],
        "right_eye": [180, 80],
        "nose": [150, 100],
        "mouth_left": [130, 130],
        "mouth_right": [170, 130]
      },
      "emotions": {
        "happy": 0.7,
        "neutral": 0.3
      }
    }
  ]
}
                      

Audio transcription converts spoken words in audio files to written text.


{
  "transcription": [
    {
      "start_time": "00:00:01",
      "end_time": "00:00:05",
      "speaker": "Speaker 1",
      "text": "Welcome to our video on AI-powered video analysis."
    },
    {
      "start_time": "00:00:06",
      "end_time": "00:00:10",
      "speaker": "Speaker 2",
      "text": "Today, we'll explore how machine learning can extract insights from video content."
    }
  ]
}
                      

Text reading extracts and recognizes text present in images or video frames.


{
  "text_regions": [
    {
      "bounding_box": [50, 100, 300, 150],
      "text": "AI-Powered Video Analysis",
      "confidence": 0.95
    },
    {
      "bounding_box": [75, 200, 275, 250],
      "text": "Extracting Insights",
      "confidence": 0.92
    }
  ]
}
                      

Activity description provides a detailed analysis of actions and events occurring in the video.


{
  "activities": [
    {
      "timestamp": "00:00:05",
      "description": "A person is jogging in a park",
      "confidence": 0.95,
      "objects": ["person", "trees", "path"],
      "actions": ["jogging", "moving"]
    },
    {
      "timestamp": "00:00:15",
      "description": "A dog is playing fetch with its owner",
      "confidence": 0.92,
      "objects": ["person", "dog", "ball"],
      "actions": ["throwing", "running", "catching"]
    }
  ]
}
                      

...or use everything at once

You can choose to use each method individually or just index the entire video for end-to-end search. They can come from a live camera feed or object storage like AWS S3.

1

Integrate

Connect your data sources securely to Mixpeek's processing pipeline.

Integration Docs
2

Index

Leverage 100s of models to pull out data from your files or feeds.

Indexing Docs
3

Analyze

Leverage your newly-structured data to build apps powered by previously unaccessible data.

Use Case Docs
              
mixpeek.search("person jogging in park with dog")

{
  "results": [
    {
      "start_time": 0,
      "end_time": 5,
      "embedding": [0.1, 0.2, 0.3, 0.4],
      "faces": ["face.jpg"],
      "transcription": {
        "text": "It's a beautiful day for a jog in the park.",
        "speaker": "Narrator"
      },
      "text": [
        {
          "text": "Park Entrance",
          "bounding_box": [50, 100, 300, 150],
          "confidence": 0.95
        }
      ],
      "descriptions": {
        "description": "A person is jogging on a path in a sunny park",
        "confidence": 0.92
      }
    }
  ]
}               
              
            

Focus on your users, let us handle the...

Real-Time Synchronization

Every change, no matter where or in what form gets sent to our processing pipeline in real-time.

Extraction and Embedding

Pull out the important bits and convert them into embeddings and metadata that can be used for AI.

Fine-Tuning and Scaling

Every model can be fine-tuned to your specific use-case and scaled to handle any amount of data.

AWS
MongoDB
Azure
GCP

Zero Platform Risk

Fully managed or self-hosted

Easy to Use

Get started on the free plan with an easy-to-use API or the Python client.

Scalable

Scale from zero to billions of items, with no downtime and minimal latency impact.

Pay for What you Use

Start free, then pay only for what you use with usage-based pricing.

Free Forever Tier

We will never charge you if you maintain under the file quota.

Reliable

Choose a cloud provider and region — we'll take care of uptime, consistency, and the rest.

Secure

mixpeek is SOC 2 Type II and GDPR-ready. It's built to keep data secure. See our security stance.

Become a multimodal maker.

Upgrade your application with video understanding in one line of code.