Documentation Index Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
The face identity extractor provides production-grade face recognition using state-of-the-art models (SCRFD for detection + ArcFace for embeddings). Detects faces, aligns to canonical template, and generates 512-dimensional embeddings with 99.8%+ verification accuracy (LFW benchmark).
Pipeline Steps
Filter Dataset (if collection_id provided)
Filter to specified collection
Content Type Routing
Images: Direct to Step 3
Videos: Frame extraction (sampling at video_sampling_fps) → Step 3
PDFs: Page rendering → Step 3
Mixed: Branch by type, process separately, union results
Face Detection (SCRFD)
Detect all faces per image/frame/page
Extract 5-point facial landmarks (eyes, nose, mouth)
Filter by min_face_size and detection_threshold
5-Point Affine Face Alignment
Warp face to canonical 112×112 template
Ensures consistent embeddings
ArcFace Embedding Generation
arcface_r100 model
512D L2-normalized embeddings
Cosine similarity for matching
Quality Scoring (conditional: if enable_quality_scoring=true)
Assess blur, size, landmark confidence
Filter by quality_threshold if specified
Video Deduplication (conditional: if video_deduplication=true AND video content)
Remove duplicate faces across frames
Threshold-based similarity matching
Track face timelines in video
Output Validation
Output
Per-face documents with embeddings, bbox, landmarks, quality scores
When to Use
Use Case Description Face verification 1:1 matching to verify identity Face identification 1:N search to identify a person in a database Face clustering Group photos by person automatically Employee verification Workplace identity systems Photo organization Organize photo libraries by people Surveillance Security and monitoring applications
When NOT to Use
Scenario Recommended Alternative General image search image_extractorObject/scene detection multimodal_extractorVideo content analysis multimodal_extractorNon-face biometrics Specialized extractors
Input Type Description Processing imagestring URL or S3 path Detect and embed all faces videostring URL or S3 path Sample frames, detect faces, deduplicate video_framestring URL or S3 path Treated as image
Supported formats:
Image : JPEG, PNG, WebP, BMP
Video : MP4, MOV, AVI, MKV, WebM
Recommended resolution : 640px+ for optimal face detection
Provide one of the following inputs:
{
"image" : "s3://photos/john-doe-portrait.jpg"
}
{
"video" : "s3://segments/interview-clip.mp4"
}
Field Type Description imagestring Image URL or S3 path containing faces videostring Video URL or S3 path. Subject to max_video_length limit video_framestring Single video frame URL or S3 path (treated as image)
Output Schema
Each detected face produces one document with the following fields:
Field Type Description face_identity_extractor_v1_embeddingfloat[512] ArcFace embedding, L2 normalized face_indexinteger Index of this face in source image (0-based) bboxobject Bounding box {x1, y1, x2, y2, width, height} detection_scorenumber SCRFD detection confidence (0.0-1.0) landmarksobject 5 facial landmarks for alignment quality_scorenumber Face quality score (0.0-1.0) quality_componentsobject Quality component scores (blur, size, etc.) aligned_face_cropstring Base64 aligned 112x112 face crop (optional) frame_numberinteger Frame number in source video timestampnumber Timestamp in source video (seconds) embedding_modelstring Embedding model used detection_modelstring Detection model used processing_time_msnumber Processing time (milliseconds)
{
"face_identity_extractor_v1_embedding" : [ 0.023 , -0.041 , 0.018 , ... ],
"face_index" : 0 ,
"bbox" : { "x1" : 120 , "y1" : 80 , "x2" : 280 , "y2" : 300 , "width" : 160 , "height" : 220 },
"detection_score" : 0.98 ,
"landmarks" : { "left_eye" : [ 150 , 140 ], "right_eye" : [ 230 , 142 ], ... },
"quality_score" : 0.85 ,
"embedding_model" : "arcface_r100" ,
"detection_model" : "scrfd_2.5g" ,
"processing_time_ms" : 45.2
}
Parameters
Detection Parameters
Parameter Type Default Description detection_modelstring "scrfd_2.5g"SCRFD model variant min_face_sizeinteger 20Minimum face size in pixels to detect detection_thresholdfloat 0.5Confidence threshold (0.0-1.0) max_faces_per_imageinteger nullMaximum faces to process per image
Detection Models
Model Speed Accuracy Best For scrfd_500m2-3ms Good Real-time applications scrfd_2.5g5-7ms Better Recommended - balancedscrfd_10g10-15ms Best Maximum accuracy
Embedding Parameters
Parameter Type Default Description embedding_modelstring "arcface_r100"Face embedding model normalize_embeddingsboolean trueL2-normalize to unit vectors
Embedding Models
Model Accuracy (LFW) Speed Notes arcface_r10099.8%+ Standard Recommended - highest accuracyarcface_r5099.5%+ Faster Slightly lower accuracy magface_r10099.7%+ Standard Includes built-in quality score
Quality Parameters
Parameter Type Default Description enable_quality_scoringboolean trueCompute quality scores (adds ~5ms per face) quality_thresholdfloat nullMinimum quality to index (null = index all)
Quality threshold guide:
null - Index all detected faces
0.5 - Moderate filtering (removes low quality)
0.7 - High quality only
Video Parameters
Parameter Type Default Description max_video_lengthinteger 60Maximum video length in seconds video_sampling_fpsfloat 1.0Frames per second to sample video_deduplicationboolean trueRemove duplicate faces across frames video_deduplication_thresholdfloat 0.8Cosine similarity for deduplication
Output Parameters
Parameter Type Default Description output_modestring "per_face"per_face or per_imageinclude_face_cropsboolean falseInclude aligned 112x112 face crops as base64 store_detection_metadataboolean trueStore bbox, landmarks, detection scores
Configuration Examples
Employee Verification (High Quality)
Group Photo Processing
Surveillance Video
Photo Library Organization
Real-time Access Control
{
"feature_extractor" : {
"feature_extractor_name" : "face_identity_extractor" ,
"version" : "v1" ,
"input_mappings" : {
"image" : "payload.photo_url"
},
"field_passthrough" : [
{ "source_path" : "metadata.employee_id" }
],
"parameters" : {
"detection_model" : "scrfd_2.5g" ,
"detection_threshold" : 0.7 ,
"embedding_model" : "arcface_r100" ,
"enable_quality_scoring" : true ,
"quality_threshold" : 0.5 ,
"max_faces_per_image" : 1 ,
"min_face_size" : 40
}
}
}
Face Matching
Use cosine similarity to match faces:
Similarity Score Interpretation > 0.30 Very likely same person 0.25 - 0.30 Likely same person 0.20 - 0.25 Possibly same person < 0.20 Different people
Recommended threshold : 0.25-0.30 for same person verification
Metric Value Detection accuracy 99%+ (WIDER FACE benchmark) Verification accuracy 99.8%+ (LFW benchmark) Processing speed Detection: 5-7ms, Embedding: 10-15ms per face Cost per image 5 credits base Cost per face 5 credits additional per detected face
Video Processing
Deduplication : Reduces 90-95% redundancy in video
Sampling : 1 FPS recommended for most use cases
Max length : 300 seconds (extraction only)
Vector Index
Property Value Index name face_identity_extractor_v1_embeddingDimensions 512 Type Dense Distance metric Cosine Datatype float32 Inference model face_identity_arcface_r100_v1
Pipeline Overview
SCRFD Detection - Bounding boxes + 5 landmarks
5-Point Affine Alignment - 112x112 canonical face
ArcFace Embedding - 512-d L2-normalized vector
Quality Scoring (optional) - Filter low-quality faces
Limitations
Face only : Does not identify age, gender, or expressions
Pose sensitivity : Extreme angles may reduce accuracy
Occlusion : Masks, glasses, hair may affect detection
Resolution : Minimum 20px face size, 40px+ recommended
Lighting : Poor lighting reduces quality scores
Video length : Maximum 300 seconds per video