Identification of clinical diagnostic and immune cell infiltration characteristics of acute myocardial infarction with machine learning approach

Wait 5 sec.

IntroductionAcute myocardial infarction (AMI) is a deleterious coronary heart disease that decreases blood supply to the coronary artery1. AMI is the leading cause of death and disability, specifically in middle-aged individuals2. Inflammatory and immune regulators cause irreparable damage to the myocardial cells, which cause detrimental complications, such as malignant arrhythmia, and even heart failure3. Therefore, early diagnosis and timely treatment of AMI are of profound significance to prevent disease progression.Cardiac troponins (cTns) are extremely important diagnostic markers for cardiovascular diseases including AMI4. However, cTns are released only after myocardial injury, and therefore, AMI cannot be early diagnosed5. Meanwhile, in clinical application, false-positive cases of increased cTns also lead to misdiagnosis and even wrong treatment, which causes serious consequences to the patient6. Therefore, identifying a promising biomarker to predict and treat AMI is urgent. Transcriptome and expression profile sequencing have been widely used in AMI studies7. However, data from a single center or adopting a single approach may not be adequate in identifying biological markers of AMI.The majority of available target gene screening approaches rely on transcriptomic analysis, principal component analysis, or association8. Evidence suggests that analyzing significance cannot explain a critical fraction of phenotypic variation, and this technique has resulted in a limited capacity to find high-risk populations at rigorous significance levels. Weighted gene co-expression network analysis (WGCNA) is a classic bioinformatics approach that may be used to assess various gene expression profiles and investigate their association with clinical characteristics9. Compared to traditional approaches, machine learning, a prototype after a systematic search for high-dimensional collection various predictors, might be a beneficial tool for improving clinical encounters for disease detection and prevention10.In this study, we identified differential expression analysis, enrichment analysis, and investigated candidate hub genes of AMI by integrating the Gene Expression Omnibus (GEO) database with Weighted Gene Co-expression Network Analysis (WGCNA) and three machine learning algorithms, including Support Vector Machine (SVM), Random Forest (RF) and Least Absolute Shrinkage and Selection Operator (LASSO). Immune infiltration analysis and vivo experiments identified that FOS and IL18RAP were associated with the infiltration of CD4 naive T and neutrophils in AMI, which provide new insights for the diagnosis and treatment of clinicians.Materials and methodsData acquisition and processingRNA expression profiles in AMI (GSE61145, GSE34198, and GSE66360) were obtained from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/). The clinical details of GEO datasets are listed in Table S1. For data prepossessing, probes were mapped to genes and empty probes were removed. Multiple probes corresponded to the same gene, and the median value was chosen as the expression of the gene. The batch effect from the data in GSE61145 was removed from the combat function in the SVA package of R software11.Identification of differentially expressed genes (DEGs)Background correction, normalization and gene symbol conversion were performed on the integrated AMI datasets (GSE61145 (n = 57; GPL6106, GPL6884), GSE34198 (n = 97; GPL6102), and GSE66360 (n = 99; GPL570)). The dysregulated expression analysis was identified using the R package “limma”12 and heatmap and volcano plots were constructed. Therefore, DEGs in AMI dataset were screened upon the thresholds of adjusted p ≤ 0.05 and |log2 (fold change)| ≥ 0.585. Subsequently, the expression patterns of DEGs were visualized in the form of volcano plots and heatmaps with the “ggplot2” package and “pheatmap” package in R software, respectively.Function enrichment analysisAs the method used in our previous study13, the Gene Ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were performed using the clusterProfiler package of R software. During Gene Ontology analysis, biological process (BP), cellular component (CC) and molecular function (MF) were identified. The top 10 GO terms were visualized using the R package “ggplot2” in each category. False discovery rate