How many keyframes are extracted per video?

The number depends on the video content. A one-hour lecture with few scene changes may produce 10-20 keyframes, while a fast-cut music video could yield 200+. You can set a maximum count or a minimum interval between frames.

Can I control the sensitivity of scene detection?

Yes. The `sensitivity` parameter ranges from 0.0 to 1.0. Lower values capture only major scene changes; higher values capture subtler transitions like camera pans and lighting shifts.

What image format are keyframes returned in?

Keyframes are returned as JPEG by default. You can also request PNG or WebP via the `output_format` option.

media

Video
Images
Converter

Automatically detect scene changes and extract representative keyframes from any video. Each keyframe includes a timestamp, scene label, and optional caption generated by a vision model.

Max file size: 5 GB

Estimated: 1-5 min per hour of video

5 input formats

How It Works

Upload your video file or provide a URL.

Scene-change detection identifies visual transition points.

Representative frames are extracted at each transition.

A vision model captions each keyframe and assigns a scene label.

Keyframes are returned as images with metadata.

Code Examples

from mixpeek import Mixpeek

client = Mixpeek(api_key="YOUR_API_KEY")

result = client.convert(
    source="https://example.com/promo.mp4",
    from_format="video",
    to_format="keyframes",
    options={
        "sensitivity": 0.5,
        "max_frames": 50,
        "include_captions": True
    }
)

for frame in result.keyframes:
    print(frame.timestamp, frame.caption)

Use Cases

Build visual indexes for video libraries

Generate storyboards for film and advertising review

Create thumbnail galleries for e-learning platforms

Power visual search across video catalogs

Supported Input Formats

MP4

MOV

AVI

MKV

WebM

Quick Info

Categorymedia

Max File Size5 GB

Est. Time1-5 min per hour of video

Extractorvideo-descriptor

Try This Conversion

Get started with the Mixpeek API and convert your first file in minutes.

Frequently Asked Questions

Related Converters

Video

Text

Video to Text

Extract spoken dialogue, on-screen text, and scene descriptions from video files using multimodal AI. Produces time-stamped transcripts with speaker diarization and OCR-detected overlays.

Video

Embeddings

Video to Embeddings

Generate dense vector embeddings for video content using multimodal models. Embeddings capture visual, audio, and temporal features, enabling semantic search and similarity matching across video collections.

Video

Thumbnails

Video to Thumbnails

Generate optimized thumbnail images from video files. Uses intelligent frame selection to pick the most visually appealing and representative frames, with optional face detection and composition scoring.

Image

Caption

Image to Caption

Generate natural-language captions for images using a vision-language model. Produces concise, descriptive sentences suitable for alt text, content indexing, and accessibility compliance.

Ready to convert video to images?

Start using the Mixpeek Video to Keyframes in minutes. Sign up for a free API key and follow the documentation to get started.

VideoImagesConverter

How It Works

Code Examples

Use Cases

Supported Input Formats

Quick Info

Try This Conversion

Frequently Asked Questions

Related Converters

Video to Text

Video to Embeddings

Video to Thumbnails

Image to Caption

Ready to convert video to images?

Video
Images
Converter