VideoFacesConverter
Detect, track, and extract all faces appearing in a video. Returns aligned face crops, bounding boxes, timestamps, and optional identity embeddings for each detected face. Supports face clustering to group appearances of the same person across the video.
How It Works
Upload a video file or provide a URL to the Mixpeek API.
Frames are sampled and processed through a face detection model (SCRFD) to locate faces.
Detected faces are tracked across frames to maintain identity continuity.
Each unique face is aligned and cropped using facial landmark detection.
Face identity embeddings (ArcFace 512D) are generated for each unique face, enabling clustering and matching.
Code Examples
from mixpeek import Mixpeekclient = Mixpeek(api_key="YOUR_API_KEY")result = client.convert(source="https://example.com/panel-discussion.mp4",from_format="video",to_format="faces",options={"cluster_faces": True,"include_embeddings": True,"min_face_size": 80,"sample_fps": 2})print(f"Unique faces found: {len(result.face_clusters)}")for cluster in result.face_clusters:print(f" Person {cluster.id}: {cluster.screen_time}s on screen")print(f" First seen: {cluster.first_appearance}s")print(f" Detections: {cluster.detection_count}")
Use Cases
Supported Input Formats
Quick Info
Try This Conversion
Get started with the Mixpeek API and convert your first file in minutes.
Frequently Asked Questions
Related Converters
Video to Keyframes
Automatically detect scene changes and extract representative keyframes from any video. Each keyframe includes a timestamp, scene label, and optional caption generated by a vision model.
Video to Thumbnails
Generate optimized thumbnail images from video files. Uses intelligent frame selection to pick the most visually appealing and representative frames, with optional face detection and composition scoring.
Video to Scenes
Automatically segment videos into individual scenes using visual and audio cue detection. Each scene includes a start and end timestamp, a representative keyframe, a descriptive label, and a confidence score for the detected boundary.
Video to Metadata
Extract comprehensive technical and semantic metadata from video files. Returns codec details, resolution, duration, frame rate, and AI-generated semantic tags including detected objects, scenes, dominant colors, and content categories.
Image to Metadata
Extract comprehensive technical and semantic metadata from images. Returns EXIF data, camera settings, GPS coordinates, and AI-generated semantic tags including detected objects, scene type, dominant colors, and content categories.
Ready to convert video to faces?
Start using the Mixpeek Video to Faces in minutes. Sign up for a free API key and follow the documentation to get started.
