Deep learning-based automatic facial symmetry scoring in peripheral facial palsy

Wait 5 sec.

IntroductionPeripheral facial palsy (PFP) substantially affects quality of life due to facial asymmetry, functional impairment, and psychosocial distress1,2,3,4, often complicated by facial synkinesis in the chronic stage of the disease5. Accurate assessment of facial muscle activity, which reflects the functional output of the facial nerve, is essential for monitoring recovery and guiding rehabilitation6. Improvement of facial symmetry is the primary goal of facial rehabilitation7,8. Although various clinical grading scales and questionnaires are available, these often rely on subjective facial grading, resulting in limited objectivity and interrater variability. Technological advances have enabled the development of computerized assessment methods providing objective and reproducible measurements. However, the clinical availability and validation of such automated tools remain limited, and no universally accepted gold standard has been established9. Many objective approaches require complex setups, high-cost equipment, or extensive manual input, restricting their feasibility for routine clinical use. This underlines the need for practical, accessible, and reliable methods to objectively evaluate facial symmetry and dynamic movements in patients with PFP.This study aims to evaluate a new automated method using standardized photographic recordings of patients with PFP to visualize dynamic changes between neutral and expressive facial states through heatmaps and to derive an objective symmetry score that quantifies the uniformity of facial movements between the two sides of the face.Materials and methodsThe study was approved by the local institutional review board (IRB) at Jena University Hospital (registration number 2019-1539-BO). Due to the retrospective nature of the investigation, written informed consent was waived by the IRB at Jena University Hospital. All methods were performed in accordance with the relevant guidelines and regulations, and the study was conducted in accordance with the Declaration of Helsinki.Study population and image acquisitionA retrospective analysis was performed on 518 datasets. Although no formal standardization criteria such as fixation devices or anatomical reference lines were applied, image acquisition followed a consistent protocol: All photographs were taken by a dedicated clinical photographer, with the patient seated in front of a uniform, neutral-colored paper backdrop. The photographer was seated directly opposite the patient, allowing for a reproducible frontal perspective. The camera was handheld to permit flexible adjustment of angle and height, while an approximately consistent distance was maintained across sessions. Two softboxes supplemented the camera flash to ensure even facial illumination. This setup provided sufficient consistency for comparative analysis under real-world clinical conditions.The inclusion criteria required the availability of nine standardized facial photographs per patient. After applying these criteria, 405 datasets from 198 patients with unilateral PFP (94 female, 104 male) were included, with examinations conducted between March 26, 2008, and December 20, 2011. The age at examination ranged from 4 to 90 years (mean 53 ± 19 years). The age distribution of the total 405 datasets is as follows, grouped in 20-year intervals: 10 datasets aged 4–19 years, 118 datasets aged 20–39 years, 98 datasets aged 40–59 years, 159 datasets aged 60–79 years, and 20 datasets aged 80–90 years. Among these patients, 98 had at least two datasets acquired on different days during therapy. Details on the underlying etiologies and treatment can be found elsewhere.For each patient, nine standardized facial photographs were taken, capturing the following expressions:1.Neutral facial expression (reference)2.Eyes gently closed3.Eyes tightly closed4.Frowning/forehead wrinkling5.Nose wrinkling6.Closed-mouth stretch7.Mouth stretch with teeth showing8.Lip pursing9.Mouth corners downImages were acquired primarily using Nikon DSLR cameras (mainly D90 and D100 models), with a smaller number from Canon and Sony devices. Flash was used in 309 cases, not used in 82, and the status was unknown in 14. Auto white balance was applied in 294 cases, manual in 97, and unknown in 14. Image resolutions varied widely, with the most common sizes being 1772 × 2362 pixels (189 cases), 709 × 1063 pixels (125 cases), and 712 × 1063 pixels (28 cases), ranging from 591 × 882 to 2632 × 3963 pixels.Image preprocessing and analysisThe neutral facial expression image from the standardized facial photograph series served as the reference for automated comparison with expression images. In other words, the absolute difference is calculated between each expressive image and the individual’s neutral facial expression, which serves as the baseline. Subsequently, the symmetry score is computed from this difference image by comparing the left and right facial halves. This means that each symmetry score reflects deviations from the patient’s own neutral state, allowing assessment of dynamic changes rather than static asymmetries. Figure 1 illustrates the method’s processing workflow, with each step explained in more detail in this section.Fig. 1Processing steps using a neutral facial expression reference (REF) and an expression image.Full size imageEach image was processed in Python version 3.10.9 (Python Software Foundation10) using a deep learning-based facial landmark detection model implemented in MediaPipe (v0.10.21, Google Research11) with refined landmark prediction enabled, estimating 478 facial landmarks per image (see Fig. 2a). Subsequently, the Euclidean distance between the outer eye corners was used to uniformly scale all faces so that the interocular distance equaled 200 pixels. Expression images were then centered on a canvas matching the pixel dimensions of the reference image; any areas extending beyond the canvas boundaries were cropped, and corresponding facial landmarks were adjusted accordingly. Expression images were aligned to the neutral facial expression by applying an affine transformation (incorporating translation, rotation, and scaling) estimated from all facial landmarks. This transformation was to normalize facial orientation and geometry.Fig. 2The image processing steps include (a) facial landmark detection of 478 points and uniform scaling so that the interocular distance equals 200 pixels; (b) alignment of the expression image to the neutral reference and application of a face mask isolating key facial regions; (c) calculation of the absolute difference between the neutral reference and aligned expression image; (d) Gaussian smoothing of the difference image; (e) scaling by a factor of 5 to normalize pixel intensities; (f) computation of the symmetry score based on the pixel differences between the left and the mirrored right side of the mask in (e); and (g) visualization of facial movements using a heatmap. Written informed consent for publication of these identifiable photographs was obtained from the patient.Full size imageA face mask was generated using 9 selected facial landmarks that define key regions such as the eyes, nose, mouth, and chin. Specifically, the landmarks corresponded to points 33 and 133 (left eye corners), 362 and 263 (right eye corners), 1 and 0 (nose bridge and tip), 13 and 14 (central points of the upper and lower lip), and 152 (chin), following the MediaPipe face mesh indexing scheme. The mask consisted of a rectangular upper half and an elliptical lower half, scaled with padding to cover the face region fully. To exclude hair and other non-facial areas, the region above the eyebrows was removed using eyebrow landmarks. The mask was then applied to isolate the facial region for further analysis (see Fig. 2b).To reduce high-frequency differences caused by facial hair or skin texture, a slight Gaussian blur with a kernel size of 5 × 5 was first applied to both the reference face and the aligned expression face. The absolute difference between the two images was then computed and converted to grayscale (see Fig. 2c). The resulting difference image was smoothed using a Gaussian blur with a kernel size of 111 × 111 and scaled by a factor of 5 to enhance pixel intensities (see Fig. 2d,e). This processed difference image served as the basis for visualizing dynamic facial changes using a heatmap and for computing a facial symmetry score (see Fig. 2f,g). The heatmap was generated by applying a rainbow-like color map that enhances visual interpretation of facial movement by mapping low values to blue and high values to red.A symmetry score was calculated by comparing the left and right facial halves within the masked area of the difference image. The right half was mirrored to align with the left side, and only overlapping, valid pixels were included in the comparison. The basic symmetry measure is the inverse of the mean absolute difference between the two halves:$${S}_{mean}=1- \frac{1}{N}\sum_{i=1}^{N}\left|{L}_{i}-{R}_{i}\right|$$(1)where ${L}_{i}$ and ${R}_{i}$ are pixel intensities in the left and flipped right halves, and $N$ is the number of pixels. To penalize irregular asymmetries, the score is weighted by the variance of these differences:$$S={S}_{mean}\times \left(1-min\left(\frac{{\sigma }^{2}}{{\sigma }_{max}^{2}}, 1\right)\right)$$(2)where ${\sigma }^{2}$ is the variance of the pixel differences, and ${\sigma }_{max}^{2}$ is a predefined maximum variance for normalization (set to 5000 in this study). The smaller ${\sigma }_{max}^{2}$, the faster the score decreases with increasing variance. The final symmetry score ranges from 0 (low symmetry) to 1 (high symmetry).EvaluationAll 405 facial image datasets from 198 patients were processed using the described image preprocessing and analysis pipeline. Subsequently, symmetry scores were computed and analyzed across different time points within each patient to assess changes throughout facial nerve recovery. To quantify progression, a robust trend analysis was applied to the individual symmetry scores of each patient. For patients with at least three measurements, a rolling median filter with a window size of three was applied to reduce the influence of outliers while preserving the overall trend. For patients with fewer measurements, the original values were used without filtering. The slope was then calculated using the Theil-Sen estimator, which is robust to outliers and small sample sizes. Based on the slope, trends were classified as improvement, deterioration, or no significant change. A conservative threshold of ± 0.001 was used to exclude minor fluctuations and identify only meaningful trends.The Stennert Index values during movement were retrospectively extracted from patients’ clinical records at the time points of image acquisition when available. This index is a facial grading tool widely used in Germany in clinical routine12. It is assessed separately at rest and during voluntary facial movement, quantifying the severity of facial nerve palsy at rest on a scale from 0 (no asymmetry) to 4 (severe side differences), and during movement on a scale from 0 (all facial movements are normal) to 6 (total or near-total facial movement restrictions). In this study, only the values estimated during voluntary facial movement were considered. These values were documented as part of routine clinical care by multiple experienced clinicians, with each patient evaluated by a single examiner per time point. As no patient was assessed by more than one examiner at the same time point, and repeated ratings by the same examiner were not performed, intra- and inter-examiner reliability could not be formally assessed. The Stennert Index is a regional facial grading system comparable in structure and scoring logic to the Sunnybrook Facial Grading System, for which high intra- and interrater reliability has been demonstrated in previous studies13,14,15. While no dedicated reliability studies exist for the Stennert Index, it is reasonable to assume that, when applied by experienced raters, its reliability is comparable to that of the Sunnybrook system.These extracted Stennert scores were then correlated with the image-derived symmetry scores to investigate the relationship between clinical assessments and objective measures of facial symmetry. Importantly, the clinical evaluations (Stennert Index) and the computational image analyses were conducted independently by different evaluators. The individuals responsible for the image processing and symmetry scoring were not involved in the clinical assessment of patients, thereby ensuring methodological independence and reducing potential confirmation bias. Spearman’s rank correlation coefficient was used to analyze this association, given the ordinal nature of the Stennert Index and the continuous nature of the symmetry scores. Statistical significance was determined based on corresponding p values (