Integrative analysis of 115 transcriptomic studies decodes the molecular landscape of neurodevelopmental disorders

Wait 5 sec.

IntroductionNeurodevelopmental disorders (NDDs) are characterized by a disrupted brain development, leading to a wide range of psychiatric and neurological conditions that affect more than 4.7% of children worldwide1,2. These conditions typically emerge in childhood2 and include rare genetic disorders, such as Rett syndrome (RTT), Fragile X syndrome (FXS), and Duchenne muscular dystrophy (DMD), as well as multifactorial conditions, such as attention deficit hyperactivity disorder (ADHD) and autism spectrum disorder (ASD)3. Despite the wide range of clinical manifestations, many neurological phenotypes are shared across distinct NDDs, indicating the involvement of common molecular pathways. Examples of such shared phenotypes include, among others, seizures, intellectual disability, microcephaly, and hypotonia.High-throughput techniques, such as RNA sequencing, have significantly advanced our understanding of the molecular pathways involved in NDDs. By identifying genes and pathways with altered expression levels, transcriptomic profiling provides insights into the pathological mechanisms and enables the discovery of potential therapeutic targets. The importance of expression profiling for therapeutic target discovery is exemplified by the observation of brain-derived neurotrophic factor (BDNF) downregulation in RTT in the early 2000’s4. Since BDNF cannot cross the blood-brain-barrier, this discovery led to the prioritization of insulin-like growth factor 1 (IGF1), a factor with similar biological properties as BDNF, as a promising therapeutic candidate5. More recently, IGF1 treatment (Trofinetide) has been FDA-approved for the treatment of RTT6, highlighting the potential of expression profiling for the identification of therapeutic targets.Nevertheless, current transcriptomic studies of NDDs are often constrained by small sample sizes or high biological variability, particularly for rare genetic disorders and conditions with a complex disease etiology, respectively. Because of these limitations, these studies do not achieve the statistical power needed to fully characterize the disease transcriptome and to uncover novel molecular pathways. Integrating multiple datasets can increase statistical power, allowing for deeper insights into the molecular pathophysiology of NDDs. In our study, we therefore integrated 151 human RNA sequencing datasets from 115 independent studies to characterize common and distinct molecular pathways of NDDs and their neurological phenotypes.ResultsThe NDD transcriptomic profile consists of 151 datasets from 115 independent studiesThe Gene Expression Omnibus (GEO) was queried for RNA sequencing data of NDDs (Supplementary Text S1), identifying 188 studies with NCBI-generated raw counts available for at least six samples. Datasets without case-control design, with less than three cases and/or controls, and without NDD cases were excluded. The 115 studies that remained after filtering were included in our analysis (Supplementary Data S1). Where possible, individual studies were stratified based on mutation type and/or cell type/tissue before performing differential expression analysis (i.e., case versus control), yielding a total of 151 distinct datasets/statistical comparisons. The differential expression estimates of the datasets were used to identify general transcriptomic changes that occur across the different NDDs and to find alterations that are associated with a specific disorder or neurological phenotype (Fig. 1A). The summary statistics of the datasets can be downloaded and interactively explored on our website (https://SyNUM.shinyapps.io/NDD-transcriptomic-atlas/).Fig. 1: Overview of the study.A Workflow of data collection and analysis. Of the 238 datasets identified by the search query, 188 datasets included at least six samples and had NCBI-generated RNA sequencing count data available. Of these datasets, those without case-control design, with less than three cases and/or controls, and without NDD cases were excluded, resulting in 115 transcriptomic datasets that were included in the analysis. Datasets were stratified based on mutation type and/or cell type/tissue before performing differential expression analysis, provided that there were at least three cases and controls within each stratum. This resulted in a total of 151 distinct datasets that were used to identify common, disorder-associated, and phenotype-specific changes. B Distribution of the number of cases and controls among the 151 datasets. The y-axis shows the number of datasets with the specified number of cases and controls on the x-axis. Most datasets include only three or four cases and controls, while only a few datasets include more than ten. C Donut chart of the number of datasets for Rett syndrome, Duchenne muscular dystrophy, Fragile X syndrome, Down syndrome, and others. D Principal coordinate analysis (PCoA) of the 151 datasets. The distance between the datasets was calculated using the Spearman correlation of the gene’s P values. The gene’s P values were calculated for each dataset through the differential expression analysis of NDD cases versus controls. The first component is associated with the cell type/tissue. Particularly, the T Cell Receptor Gamma Variable 4 (TRGV4) reaches higher levels of significance (i.e., P value rank) in immune cells, while the cholinergic receptor nicotinic alpha 4 Subunit (CHRNA4) has higher P value ranks in neural cells.Full size imageMost of the 151 datasets encompassed only three or four cases and/or controls (Fig. 1B). The most common NDDs in our meta-analysis include RTT, FXS, DMD, and down syndrome (DS), which together account for 43% of all datasets (Fig. 1C). The causative genes of these four NDDs (i.e., MECP2, FMR1, DMD, and chromosomal 21 genes, respectively) exhibited the anticipated transcriptomic alterations (Supplementary Fig. S1). Moreover, datasets clustered mostly by cell type/tissue rather than by disease, highlighting the importance of tissue choice in transcriptomic experiments (Fig. 1D, Supplementary Fig. S2). This is exemplified by cholinergic receptor CHRNA4 and T-cell receptor TRGV4, which only reach high levels of significance in neural and immune cell types/tissues, respectively.NDDs are characterized by inflammatory, translational, mitochondrial, and synaptic alterationsOur first aim was to identify transcriptomic alterations that are common across NDDs. In particular, gene set enrichment analysis (GSEA) was performed on the 151 datasets to find Gene Ontology–Biological Process (GO-BP) terms with a differential expression profile across the NDDs. The 30 GO-BP terms that reached statistical significance (i.e., false discovery rate-adjusted (FDR-adj) P value