Multi-modal sleep staging in the clinic for REM sleep behaviour disorder

Wait 5 sec.

Background: Accurate REM identification is critical for diagnosing REM sleep behaviour disorder (RBD), yet many automated sleep staging systems, especially single-channel EEG models trained on healthy cohorts, do not generalise well to real-life polysomnography (PSG) performed in patients. Objective: To compare a feature-based Random Forest (RF) model tuned for RBD with a state-of-the-art single-EEG deep architecture (AttnSleep), and to assess the impact of cohort adaptation and multimodal inputs (EEG, EOG, EMG, ECG). Methods: Experiments used 89 multi-site in-clinic PSGs (SleepWearables Phase-1) plus 53 MASS healthy controls (mean age 63, std 5 years), with 10-fold cross-validation and out-of-fold evaluation. Model performance was assessed using Cohen's kappa, and attention-based modality analysis was performed to quantify signal contributions. Results: When applied out-of-the-box after training on open-source healthy datasets, both models achieved moderate agreement overall (Cohen's kappa = 0.46), but performance declined in RBD, particularly for REM sleep (AttnSleep Cohen's kappa = 0.19 vs RF Cohen's kappa = 0.44), highlighting limited cross-cohort generalisation. The multimodal model improved overall agreement (Cohen's kappa 0.59 - 0.60) and performance in RBD (Cohen's kappa 0.45 - 0.46), with gains most pronounced in REM (Cohen's kappa 0.45 - 0.49). Attention-based modality analysis identified EEG as the dominant signal, increased EOG contribution during REM, and elevated ECG importance during N3. In RBD subjects, EOG weighting increased relative to non-RBD controls (Delta = +0.081). Guided by these weights, a reduced four-channel EEG model matched full multimodal performance in non-RBD subjects, and adding EOG achieved the best overall configuration (Cohen's kappa = 0.61 overall; Cohen's kappa = 0.48 in RBD) with improved REM classification (53% vs 45% recall). Inclusion of EOG also reduced inter-dataset variability in REM staging. Nonetheless, staging performance in RBD remained lower than in controls, particularly for REM. Conclusions: These results highlight the limited generalisability of minimal-sensor models trained on healthy cohorts, the value of mixed cohort-specific training, and the benefit of multimodal integration and attention-guided channel selection, rather than minimal-sensor approaches alone, for robust clinical sleep staging in pathological populations such as RBD.