IntroductionNoonan syndrome (NS) is an autosomal dominant disorder characterized by dysregulation of the RAS/MAPK pathway, affecting approximately 1 in 1000 to 2500 live births1. This multisystem condition presents distinctive facial features, short stature, cardiac abnormalities (particularly pulmonary valve stenosis), and neurodevelopmental differences2. Despite being one of the higher prevalence genetic disorders, NS frequently remains underdiagnosed due to its variable expressivity and phenotypic heterogeneity. While a definitive diagnosis relies on genetic evaluation initiated by clinical suspicion, the condition’s complex and diverse presentation often results in a lack of recognition and substantial diagnostic delays3,4.Recent advances in artificial intelligence (AI) have created promising opportunities for enhancing rare disease detection5. For NS specifically, various computational approaches have shown promise, ranging from facial feature analysis6,7 to electronic health record (EHR)-based methods8. Our previous work demonstrated that deep learning models trained on structured EHR diagnosis texts could effectively identify potential NS cases8. However, translating this in silico success into clinical practice remains a critical challenge.To address this gap, we conducted a comprehensive validation study of our deep learning approach. By deploying our model across a large patient cohort with available biological samples, we evaluated its real-world effectiveness through genetic sequencing and expert clinical assessment. This study provides essential insights into the clinical utility of computational screening tools for rare disease detection.ResultsCohort characteristics and model risk score distributionThe study cohort comprised 92,493 patients enrolled in the DT Biobank as of May 1st, 2021 (demographics detailed in Table 1). After excluding 65 patients with pre-existing NS diagnoses, the remaining 92,428 patients were analyzed using their complete de-identified diagnosis description text. The dataset contained 14,969,183 diagnostic entries, including documented symptoms, clinical findings, phenotypic features, and disease diagnoses, with a mean of 162 entries per patient. These comprehensive diagnostic records served as input for the predictive DCNN model.Table 1 Demographics of the study cohortFull size tableThe model generated NS risk scores for all patients, with a mean score of 0.004 (distribution shown in Fig. 1A). The majority of patients received very low risk scores, with 171 patients (0.19%) exceeding our predetermined high-risk threshold of 0.8 (indicated by the red dashed line in Fig. 1A). This distribution aligns with the expected rarity of NS in this population and suggests that the model maintains appropriate specificity in real-world applications.Fig. 1: Distribution and precision analysis of predicted NS risk scores.A Distribution of predicted NS risk scores across 92,428 patients plotted on a log10 scale. The red dashed line indicates the high-risk threshold (0.8). The strongly right-skewed distribution reflects the model’s high specificity, with 171 patients (0.19%) classified as high-risk. B Relationship between model precision and disease prevalence. The blue line shows the theoretical relationship at fixed sensitivity (40%) and specificity (99.82%). Red dots with error bars indicate observed precision in this real-world study (lower) and previous validation set (upper). Gray dashed lines mark the 95% confidence interval of the estimated disease prevalence in the current study cohort. Figure generated with R ggplot2.Full size imageTo examine associations between demographic factors and risk prediction, we performed a linear regression analysis using log10-transformed risk scores as the dependent variable, with demographic factors plus diagnostic entry count as covariates. The regression model explained 7.6% of total score variance and revealed several significant associations, including lower scores in females compared to males, lower scores in Black patients compared to White patients, and negative correlation between score and age (Supplementary Table 2). The relationship between the score and the number of diagnostic entries was significant but non-linear: the score decreased with entry count up to 200 before showing an increasing trend. These relationships are visualized in Supplementary Fig. 1. Notably, none of these associations were observed within the high-risk group (score >0.8).Chart review and genetic testing cohort selectionManual chart review of the 171 high-risk patients was conducted in February 2022 by a clinical geneticist (KNW) and genetic counselor (AS) to identify and exclude those with existing genetic diagnoses prior to subsequent genetic sequencing.The review identified 86 patients with prior genetic diagnoses, with Alagille syndrome (n = 15) and Williams syndrome (n = 13) being the most prevalent among 10 recurrent conditions (Supplementary Table 3). Notably, three patients received NS diagnoses after the initial EHR data extraction (May 2021), demonstrating the model’s predictive capability:Patient A: Diagnosed May 2021, pathogenic heterozygous PTPN11 variant NM_002834.5(PTPN11):c.922 A > G (p.Asn308Asp). This variant is classified as Pathogenic by FDA expert panel in ClinVar.Patient B: Diagnosed September 2021, pathogenic heterozygous PTPN11 variant NM_002834.5(PTPN11):c.854 T > C (p.Phe285Ser). This variant has been reported in multiple patients with NS and is classified as Pathogenic in ClinVar.Patient C: Presented with characteristic NS features including short stature, intellectual disability, suggestive facial dysmorphology, mitral valve dysplasia/regurgitation, and impaired ventricular relaxation. Comprehensive genetic analysis revealed compound heterozygous pathogenic variants in LZTR1 confirmed to be in trans by phasing analysis:NM_006767.4(LZTR1):c.263 G > T (p.Gly88Val). Classified as VUS in ClinVar and reclassified as Pathogenic based on ACMG criteria PVS1 (splice disruption), PM3 (in trans with pathogenic variant) and PM2 (population frequency)9.NM_006767.4(LZTR1):c.1943-256 C > T. Classified as Pathogenic by ClinGen RASopathy VCEP. This intronic variant disrupts normal splicing, causing frameshift and premature termination.The remaining 85 patients without identified prior genetic diagnoses proceeded to genetic evaluation through three pathways: two patients had previous clinical exome sequencing data available (with negative clinical genetic reports), 25 patients with isolated pulmonary stenosis qualified for whole genome sequencing through the GMKF Pulmonary Stenosis study, and 58 patients underwent whole exome sequencing.Genetic sequencing results and case confirmationOf the 85 patients selected for genetic testing, sequencing was successfully completed for 83 samples (2 clinical exome sequencing, 24 WGS, and 57 WES). Two samples failed sequencing due to inadequate DNA quality or quantity. All successfully sequenced samples met quality control criteria for DNA contamination, sequencing coverage, and sex concordance.Filtered variants in candidate genes across all 83 samples were reviewed by the clinical geneticist, including 53 rare missense variants and 14 rare nonsense, frameshift, or deletion variants. This analysis confirmed two additional NS cases:Patient D: Presented with short stature, VACTERL association, and Tetralogy of Fallot, with documented genetics consultations including a newborn evaluation in 2012. Sequencing revealed a heterozygous PTPN11 variant NM_002834.5(PTPN11):c.1529 A > G (p.Gln510Arg), classified as Pathogenic/Likely Pathogenic in ClinVar and through ACMG criteria (PM5, PP3, PM1, PM2, PP5).Patient E: Presented with short stature, delayed bone age, and café au lait spots. Initially suspected of having NF1 at 4 months of age (without genetic confirmation), our sequencing identified a heterozygous in-frame deletion in NF1 (NC_000017.11(NM_000267.3):c.3285_3294del). This variant, classified as Likely Pathogenic (PM2, PP1, PP4), was independently confirmed by clinical testing in April 2022. This case was included as a true positive due to NF1’s inclusion in RASopathy panels and significant phenotypic overlap with NS10,11.Additionally, phenotype-guided variant analysis identified three non-NS genetic diagnoses explaining clinical features. These findings included pathogenic variants in genes associated with other developmental disorders and congenital heart defects (detailed in Supplementary Table 4). Notably, we identified one patient with a pathogenic variant in ADNP, the gene associated with Helsmoortel-van der Aa syndrome (OMIM:615873), which was recently reported to exhibit phenotypic overlap with RASopathies12. The identification of these alternative diagnoses underscores the phenotypic overlap between NS and other genetic conditions, emphasizing the importance of comprehensive genetic evaluation in patients with complex clinical features.In total, this validation study identified five previously undiagnosed NS cases: three confirmed through chart review and two by genetic sequencing. Detailed clinical and molecular characteristics of all five cases are summarized in Table 2.Table 2 Summary of confirmed NS casesFull size tableModel performance and cohort prevalence analysisGenetic sequencing validation identified 2 NS cases among 83 patients, yielding a precision of 2.41% (95% CI: 0.66%–8.37%). Including the three NS cases identified during the study period, a total of 5 NS cases were confirmed among 171 high-risk patients, yielding an overall precision of 2.92% (95% CI: 1.26%–6.66%). This real-world precision was notably lower than the 33.3% (95% CI: 13.8%–60.9%) achieved during the previous pseudo-prospective evaluation.Despite the lower precision, the model maintained consistent specificity. With 166 false positives and n undiagnosed NS cases, specificity is (92,428 – 166 – n) / (92,428 - n), which approximates to 99.82%, given that n is very small relative to the cohort size. This result aligns closely with the 99.92% specificity (95% CI: 99.84% –99.96%) observed in our previous evaluation, conducted at a slightly higher risk score threshold (0.84 versus 0.8 in this study). Given that data sources and processing methods remained identical between studies, the marked decrease in precision can be primarily attributed to differences in disease prevalence, following the established relationship13:Figure 1B illustrates this relationship between precision and disease prevalence, assuming fixed sensitivity (40%) and specificity (99.82%). Based on our observed precision, the estimated disease prevalence in this real-world cohort is one case per 7379 individuals, equating to 12.5 undiagnosed NS cases in this study cohort of 92428. This prevalence is significantly lower than the one case per 1000 individuals used in our previous evaluation set. The calculated prevalence aligns with the expectations, as the DT Biobank likely contains fewer undiagnosed NS cases compared to diagnosed cases. When combining the estimated undiagnosed cases with the 65 known NS cases, the overall NS prevalence across all 92493 patients in the DT Biobank is one case per 1193 individuals.Phenotypic analysis of high-risk patientsWe conducted phenotype enrichment analysis using HPO terms to characterize the clinical phenotypes of model-identified high-risk patients. Comparing the 171 high-risk patients against the background population (n = 92,428), we identified significantly enriched phenotypes primarily associated with cardiac abnormalities, including pulmonic stenosis, cardiomegaly, atrial septal defect, as well as systemic manifestations such as failure to thrive, short stature, and feeding difficulties (detailed in Supplementary Table 5). When compared with the enriched phenotypes observed in the 65 previously diagnosed NS patients (Supplementary Table 6), the enriched phenotypes in the high-risk patients largely mirrored those observed in the NS patients (Fig. 2A). The majority of these NS phenotypes demonstrated stronger or similar enrichment in these high-risk patients (except Ptosis), supporting the model’s ability to identify clinically relevant features of NS.Fig. 2: Phenotypic comparison and longitudinal trajectories of NS risk scores.A Comparison of HPO term enrichment between model-identified high-risk patients (n = 171) and known NS patients in DT(n = 65). The x-axis shows log2 fold enrichment in model-identified patients, while the y-axis shows that in known NS patients. For simplicity, only enriched HPO terms in NS patients (adjusted p 0.8). While some patients (C, E) demonstrated rapid transitions, others (A, B, D) showed more gradual progression with fluctuations. Figure generated with R ggplot2.Full size imageComparative analyses within the high-risk group revealed subtle phenotypic patterns. The five confirmed NS cases showed potential enrichment of connective tissue phenotypes, particularly flexion contracture, though these associations did not remain significant after multiple hypothesis testing corrections (Supplementary Table 7). Demographic analysis found no significant differences in race, gender, or age between these five NS cases and other high-risk patients (Supplementary Table 8). While the 166 false positive cases showed no significantly enriched phenotypes compared to the overall high-risk group, analysis of specific genetic diagnoses revealed characteristic signatures: Alagille syndrome patients (n = 15) showed significant enrichment of splenic and hepatic abnormalities (Supplementary Table 9), while Williams syndrome patients (n = 13) demonstrated distinctive features including supravalvular aortic stenosis and oral cavity abnormalities (Supplementary Table 10).Risk score trajectories of the confirmed NS patientsWe conducted a longitudinal analysis of risk scores for the five confirmed NS patients by applying the predictive model to cumulative diagnosis texts for successive age points until the final EHR data extraction. While all patients converged to high-risk scores at their final assessment point, their trajectories varied considerably (Fig. 2B). Some patients (Patients C and E) exhibited rapid transitions from low to high scores, while others (Patients A, B, and D) demonstrated more gradual progression with notable fluctuations. Importantly, these trajectories revealed potential opportunities for earlier identification, as the majority of patients showed elevated risk scores years before their final model assessment.Score increases typically corresponded to the documentation of NS-related phenotypes in the EHR, however, some fluctuations were less intuitive. For example, for Patient A, the documentation of “Congenital pulmonary valve stenosis” at age 4.39 years led to a marked increase in risk score, while the subsequent addition of “Personal history of surgery to heart and great vessels, presenting hazards to health” at age 4.58 years unexpectedly led to a decrease in the score.DiscussionOur study provides real-world validation of an EHR-based deep learning approach for identifying undiagnosed NS cases, demonstrating both its capabilities and limitations in clinical practice. Through comprehensive genetic and clinical evaluation of model predictions, we identified two previously undiagnosed NS cases and validated three additional cases diagnosed during the study period. These findings confirm the model’s ability to detect previously undiagnosed NS patients in a clinical setting and offer valuable insights into the application of AI-based screening tools for rare genetic conditions.The translation of computational performance to clinical practice revealed key considerations regarding model evaluation methodology. While the model achieved 33.3% precision in a pseudo-prospective in-silico evaluation, its real-world precision of 2.92% primarily reflects differences in disease prevalence rather than model degradation. This observation emphasizes the critical role of disease prevalence in designing and evaluating rare disease screening tools. Despite the drop in precision, the model maintained high specificity in this large cohort, demonstrating consistent behavior between the pseudo-prospective evaluation and real-world implementation. This consistency validates our previous evaluation approach and supports the use of pseudo-prospective testing as a reliable method for assessing rare disease screening tools.Phenotypic analysis of high-risk patients revealed both the strengths and limitations of our approach. The significant enrichment of NS-associated features among high-risk patients, particularly cardiac abnormalities and growth parameters, validates the model’s ability to recognize clinically relevant patterns. However, the model’s reliance on EHR diagnosis text creates inherent limitations. For example, characteristic facial features such as ptosis and hypertelorism, although often present in more than 50% of NS patients clinically14,15, appeared in less than 20% of NS patients’ EHR documentation. Atypical NS presentations, especially those involving less common genes or subtle phenotypic features, may be underrecognized due to their limited representation in the training data. This discrepancy between clinical presentation and EHR documentation highlights the need for more comprehensive phenotype capture and suggests opportunities for complementary approaches, such as facial recognition-based screening tools, to enhance the detection of NS cases. Age-related fading of NS phenotypes, particularly facial features, may also limit recognition in older patients. Future models incorporating facial analysis could also include age to improve detection across age groups.The substantial proportion of alternative genetic diagnoses among high-risk cases highlights the importance of considering rare genetic diseases holistically in computational screening approaches. Our chart review revealed that approximately half of the high-risk patients had other prior genetic diagnoses, including conditions like Alagille and Williams syndromes, underscoring significant phenotypic overlap across various genetic disorders. This finding points to a pathway to improve model performance through multi-disease classification approaches. Training models with multi-class labels or incorporating multiple genetic disease diagnoses from EHR could significantly enhance screening efficiency by reducing the burden of manual chart review and improving prediction precision. Expanding the model’s classification scope would also provide a more comprehensive approach to rare disease detection, potentially increasing its clinical utility.In this study, risk scores demonstrated strong associations with diagnostic record count and demographic factors. The relationship between risk score and diagnostic entry count suggests potential bias in the training data and model development, which could be mitigated by normalizing features by patient encounter frequency or utilizing stratified, more balanced training data. The observed demographic associations aligns with prior findings that female and black patients tend to be underdiagnosed for rare genetic diseases in pediatric populations16, possibly reflecting differences in phenotype onset or recognition patterns.Our study also provides insights into the current challenges of NS diagnosis. The identification of previously undiagnosed cases, including patients with documented NS-associated features who had not undergone genetic testing, suggests ongoing obstacles in recognizing and diagnosing NS in clinical practice. Furthermore, the estimated number of undiagnosed NS cases within the study cohort suggests additional undiagnosed cases may exist, emphasizing the need for improved diagnostic approaches.Finally, several limitations of our study warrant discussion. First, while we assumed sensitivity remained consistent with our previous evaluation (40%), our study design cannot directly validate this assumption as we only evaluated patients above the 0.8 score threshold. Second, the reliance on available biobank samples may introduce selection bias. Third, reliance on genetic sequencing may affect clinical diagnosis of NS, given that underlying pathogenic variants remain unidentified in up to 20% of NS cases17. Finally, our findings from a single pediatric center may not fully generalize to other clinical settings or adult populations.Despite these limitations, our results support the potential utility of EHR and AI-based screening tools in aiding rare disease detection. The model’s ability to identify both new and subsequently diagnosed NS cases demonstrates its promise in screening candidates for genetic evaluation. Future efforts should prioritize further improving prediction precision and specificity for NS relative to other genetic conditions, leveraging additional EHR data sources, integrating other genetic disease diagnoses, and developing multi-class classification models to enhance overall screening efficiency.MethodsStudy design and overviewOur validation study of the EHR-based NS prediction model comprised three primary steps: (1) applying our previously validated model to generate patient-specific risk scores from EHR data, (2) conducting systematic chart reviews of high-risk patients (NS risk score >0.8) to exclude those with documented genetic diagnoses, and (3) performing genetic sequencing and variant analysis for diagnostic confirmation in patients without prior diagnosis. The complete workflow is illustrated in Fig. 3. This study was conducted under approval from the Cincinnati Children’s Hospital Institutional Review Board (protocol number 2020-0685). All participants were previously enrolled in the Discover Together Biobank with broad informed consent for research use of their biological samples and health data. The IRB waived additional informed consent requirements for this minimal-risk study given the existing research consent and adequate confidentiality protections. This study was conducted in full compliance with the Declaration of Helsinki and all relevant ethical regulations for human subjects research.Fig. 3: Workflow of the NS prediction model validation study.Starting with 92,493 Discover Together Biobank patients in May 2021, high-risk patients (score >0.8) underwent chart review in February 2022 and genetic sequencing from August 2022. Yellow boxes indicate patient subgroups identified through chart review; green boxes indicate confirmed NS cases. Figure generated using diagrams.net.Full size imageData descriptionThe study cohort comprised patients enrolled in Cincinnati Children’s Hospital’s Discover Together (DT) Biobank with linked EHR-biological sample data. The phenotypic information was extracted from diagnosis description text in Cincinnati Children’s de-identified structured EHR database (i2b2), encompassing patient encounters, problem lists, and billing data. Of 92,493 enrolled patients at data extraction (May 2021), 65 had prior NS diagnoses documented in the EHR, leaving 92,428 patients for model evaluation. The DT Biobank’s diverse biological sample repository enabled comprehensive genetic validation of model predictions across this large and diverse cohort.Deep learning model for NS risk scoringThe predictive model, as detailed previously8, processes de-identified EHR text to identify potential NS cases. The model architecture integrates deep convolutional neural network (DCNN) layers, dense forward-feed layers, and pooling layers, using tokenized and vectorized concatenated diagnosis texts as input. For this validation study, we employed the model version that achieved the highest area under the precision-recall curve (PR-AUC) in the previous held-out and pseudo-prospective validation sets.The top diagnosis terms most predictive of NS risk, as learned by the model, are detailed in Table 1 of our prior publication8, and were not re-derived in this study. A risk score threshold of >0.8 was implemented to identify high-risk patients, based on previous performance metrics in the pseudo-prospective validation set (sensitivity 40%, specificity 99.92% at a risk score threshold of 0.84). Importantly, this study’s cohort (n = 92,428) was entirely independent from the original model training dataset to ensure that validation results reflected the model’s ability to generalize to new, unseen data.Chart review and selection of patients for genetic sequencingFor patients with NS risk scores >0.8, a clinical geneticist (KNW) and genetic counselor (AS) with expertise in cardiovascular genetics performed comprehensive chart reviews to identify and exclude those with existing genetic diagnoses. The review process included examination of clinical notes, genetic testing reports, and documented diagnoses across the complete medical record in Epic® EHR system. Cases were classified as having prior genetic diagnosis if they had (1) documented pathogenic or likely pathogenic variants in known disease-causing genes, (2) confirmed clinically relevant chromosomal abnormalities, or (3) established clinical genetic diagnoses documented by clinicians.Patients without prior genetic testing or with inconclusive genetic findings were retained for further evaluation. Disagreements in classification were resolved through consensus discussion. From the initial set of high-risk patients, those without prior genetic diagnoses and with available biobank specimens were selected for genetic sequencing validation.Genetic sequencing and clinical evaluationDNA samples were obtained from DT Biobank for patients with NS risk scores >0.8 who lacked prior genetic diagnoses. Genetic sequencing was performed through two pathways: patients with isolated pulmonary stenosis (PS) were sequenced via whole-genome sequencing (WGS) at the Broad Institute through an NIH Gabriella Miller Kids First (GMKF)-funded study, while remaining patients underwent whole-exome sequencing (WES) at Yale Center for Genome Analysis (YCGA).For WES data, variant calling and joint genotyping were performed using the Sentieon® DNAseq pipeline18, generating cohort VCF file from FASTQ data. WGS data processing utilized GATK 3.519 for variant calling and joint genotyping at the Broad Institute. Both WES and WGS data were aligned to the hg38 reference human genome. WGS VCF files were filtered for coding regions plus 50 bp flanking sequences based on GENCODE v4320 for subsequent analysis. Quality control (QC) measures included assessment of cross-sample DNA contamination (VerifyBamID21), sequencing coverage (mosdepth22), and verification of reported versus genotype-based sex. QC failure criteria were defined as contamination FREEMIX score >0.05, mean coverage