A specialized object detection task focused on identifying and localizing human faces in visual media. Face detection is a critical first step in multimodal identity-related processing including recognition, expression analysis, and privacy filtering.
Face detection models scan an image at multiple scales and positions to find regions containing faces. Modern detectors use single-stage architectures that predict face bounding boxes and facial landmarks (eyes, nose, mouth) simultaneously. The models handle variations in pose, illumination, occlusion, and scale through multi-scale feature extraction.
Leading models include RetinaFace, MTCNN, and BlazeFace. RetinaFace uses a feature pyramid network with context modules and achieves state-of-the-art performance on WIDER FACE benchmark. Outputs typically include bounding boxes, confidence scores, and 5-point or 68-point facial landmarks. Models range from lightweight mobile versions (BlazeFace at 0.2ms) to high-accuracy server models.