Investigation of key ferroptosis-associated genes and potential therapeutic drugs for asthma based on machine learning and regression models

Wait 5 sec.

IntroductionBronchial asthma is a heterogeneous disease influenced by a combination of genetic and environmental factors. Clinically, it is characterized by recurrent episodes of wheezing, coughing, chest tightness, and shortness of breath due to airway obstruction and hyperreactivity, with chronic airway inflammation, excessive mucus production, and airway remodeling as pathological hallmarks1. Global data from 2019 report an age-standardized prevalence of approximately 3415.5 per 100,000 population and an all-cause mortality rate of about 5.8 per 100,0002. Asthma often coexists with conditions such as allergic rhinitis, obesity, and attention-deficit hyperactivity disorder in children3,4,5,6, Patients with asthma also face a higher risk of all-cause mortality7,8, leading to a significant reduction in quality of life and imposing substantial economic burdens. The complex mechanisms underlying asthma, coupled with its diverse phenotypes and endotypes, pose significant challenges for precise diagnosis and effective treatment9,10. Identifying diagnostic biomarkers for asthma could improve early detection and facilitate better disease management.Recent studies have demonstrated that various forms of cell death, including autophagy, necroptosis, pyroptosis, and ferroptosis, play roles in asthma pathogenesis11. Among these, ferroptosis—a non-apoptotic form of cell death driven by imbalances in iron metabolism and lipid peroxidation—has been shown to play a critical role in lung diseases12,13. Ferroptosis exacerbates airway inflammation, induces epithelial damage, and promotes airway remodeling in asthma. These processes involve the regulation of proteins such as PEBP1, 15LO-1, GPX-4, and SLC7A11. Interestingly, ferroptosis inhibitors or interventions targeting these proteins have been found to alleviate airway inflammation and epithelial damage14,15,16. These findings highlight the importance of ferroptosis in asthma pathogenesis and suggest its potential as a novel therapeutic target.KU-55933, a well-known non-competitive inhibitor of Ataxia telangiectasia mutated (ATM) kinase, has been widely used to counteract DNA repair mechanisms in tumor cells, thereby enhancing chemotherapy sensitivity17. Recent evidence suggests that ATM kinase promotes ferroptosis by activating NCOA4, while KU-55933 inhibits ATM activation and suppresses ferroptosis mediated by the 15LO2-PEBP1 pathway18,19. These findings indicate that KU-55933 may have therapeutic potential beyond its conventional applications, making it a promising candidate for further exploration.This study aims to investigate the role of ferroptosis in asthma pathogenesis and identify potential diagnostic biomarkers. Additionally, it explores the preliminary therapeutic potential of small-molecule compounds KU-55933, as innovative approaches for asthma treatment.MethodsData collection and processingAsthma-related transcriptome datasets were screened from the GEO database (https://www.ncbi.nlm.nih.gov/geo/). Inclusion criteria were as follows: datasets based on array expression, samples from lower airway tissues, studies involving humans, datasets including both asthma and control groups with at least five samples per group, and availability of transcriptome data. Following these criteria, GSE17915620 was included as the training dataset, consisting of airway epithelial cell transcriptomes from 57 asthma patients and 29 healthy controls. Additionally, GSE41861 and GSE10446821 were included for analysis. GSE41861 contained nasal and airway epithelial cell transcriptomes from 51 asthma patients and 30 healthy controls; however, only lower airway data were included due to its relevance to asthma as the primary site of pathology. Similarly, from GSE104468, which included nasal, airway epithelial, and Peripheral blood mononuclear cell (PBMC) transcriptomes of 12 asthma patients and 12 healthy controls, only the lower airway data were used. The original studies for all three datasets were independently conducted by research institutions in the United States, with enrolled participants being adult subjects. In our study, data from different sources were analyzed separately as training or validation sets, which helps mitigate potential heterogeneity between datasets of different origins. Furthermore, all selected data in this study were derived from lower airway tissue transcriptomes, thereby avoiding gene expression variations across different tissue types. Ferroptosis-related genes (FRGs) were downloaded from the FerrDb V2 database (http://www.zhounan.org/ferrdb/current/)22 on October 13, 2024. FerrDb V2 contains 1,001 regulators and 143 diseases related to ferroptosis, from which 525 FRGs were identified (Supplementary file 1). Detailed information on the datasets is presented in Table 1. The flowchart of this study is shown in Fig. 1.Table 1 Detailed information on datasets used in the study.Full size tableFig. 1The flowchart of this study. Sequence of all workflows in this study.Full size imageIdentification of differentially expressed genes (DEGs)Data analysis was performed using R 4.2.1, and differential gene expression was identified with the “limma” package. Screening criteria were set to adj.P  0.58. Volcano plots and heatmaps were generated using the “ggplot2” and “pheatmap” packages, respectively. Intersection genes between DEGs and FRGs were identified using the “VennDiagram” package, defined as Ferr-DEGs, and used for subsequent analyses.Correlation analysis and PPI network constructionThe “Rcircos” package was used to generate landscape maps of Ferr-DEGs. A protein–protein interaction (PPI) network was constructed using the STRING database (https://cn.string-db.org/)23.GO and KEGG pathway enrichment analysesGene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses of Ferr-DEGs were performed using the enrichGO and enrichKEGG functions in the “clusterProfiler” package, with the screening criteria set to pvalueCutoff = 0.05 and qvalueCutoff = 0.05 for both analyses. Additionally, single-gene Gene Set Enrichment Analysis (GSEA) was conducted to assess the correlations between diagnostic genes and other genes, followed by further enrichment analysis using the gseGO and gseKEGG functions.Machine learningThree machine learning algorithms—Random Forest (RF), Lasso regression, and Boruta—were applied to identify core genes. Lasso regression selects key predictors by shrinking coefficients of less relevant variables to zero, mitigating overfitting risks. In this study, a binomial family was specified, with the alpha penalty parameter set to 1 and nlambda configured to 1000 for Lasso regression. The RF algorithm combines predictions from multiple decision trees trained on random data subsets to enhance accuracy and robustness. During the RF-based feature selection process, the response variable was set as the group data, while the predictor variables consisted of gene expression data. The number of trees (ntree) was set to 500, after which the model with the lowest prediction error was used to construct the optimal random forest model for gene selection. Boruta, an extension of random forest, iteratively compares original features with shadow features to identify significant variables. The “glmnet”, “randomForest”, and “Boruta” packages were used for respective analyses, and the intersection of results from the three algorithms was considered the final core gene set for further analysis.Diagnostic model constructionThe “rms” package was used to construct a nomogram to evaluate the diagnostic performance of core genes. Calibration curves assessed the accuracy of the nomogram, which represented both individual and combined predictive capabilities of the genes. A Decision Curve Analysis (DCA) evaluated the diagnostic utility of the model. External validation was performed using GSE41861 and GSE104468 datasets.Immune infiltration analysisThe “CIBERSORT”24 package was used to calculate the proportions of various immune cells in each sample. Results were visualized using the “corrplot” package. Spearman correlation analysis was conducted to examine associations between core genes and immune scores.Compound screening using the CMap databaseThe Connectivity Map (CMap, http://clue.io/)25 database was used to identify small-molecule compounds potentially related to asthma treatment. Upregulated Ferr-DEGs from asthma datasets were uploaded to the CMap database, and compounds were ranked by enrichment scores in ascending order. The top 10 compounds were considered as potential therapeutic candidates.Molecular docking and molecular dynamics simulationsAGPS sequence information was obtained from the UniProt (https://www.uniprot.org/) database and its 3D structure predicted using the AlphaFold3 (AF3) AI program (https://alphafoldserver.com/)26. Molecular structures of small compounds were retrieved from the PubChem (https://pubchem.ncbi.nlm.nih.gov/) database. AutoDock 1.5.7 software was used for molecular docking, and results were visualized using PyMOL 2.6.0 software. Lower binding energies indicated stronger protein–ligand interactions. The docked complexes were imported into Gromacs 2.0 software for molecular dynamics simulations, and protein–ligand binding free energies were calculated using the Molecular Mechanics/Poisson-Boltzmann Surface Area (MM/PBSA) method.ResultsIdentification of DEGs in asthmaPrincipal Component Analysis (PCA) revealed distinct differences in gene expression between asthma and control groups in the GSE179156 dataset (Fig. 2a). A total of 758 DEGs were identified using the Limma package, including 361 upregulated and 397 downregulated genes in asthma samples (P