IntroductionGlioblastoma multiforme (GBM) is the most prevalent malignant primary brain tumour in adults1, with a 1-year survival rate of 40.9% and a 5-year survival rate of 6.6%2,3. Clinical research results in China are consistent with observations in other countries4,5. The poor prognosis of GBM patients, which is in part due to the high heterogeneity of tumour cells and the primary or acquired resistance of tumour cells to chemotherapy, immunotherapy, and radiotherapy6,7. Distinct mRNA types produced by alternative splicing are an important mechanism for resistance to these therapies8.Alternative splicing is an important post-transcriptional regulation process by which different combinations of exons are joined together and result in the production of multiple forms of mRNA from a single pre-mRNA9, which in turn may affect gene expression levels and mRNA translation into proteins, conferring different functional properties10,11,12,13. It has been shown that splicing regulatory abnormality is highly correlated with tumourigenesis and progression14,15,16, and that dysregulation of alternative splicing exhibit distinct patterns in different tumours and cells17,18,19,20.In glioma, increased splicing burden leads to an increase in abnormal alternative splicing events (ASE), such as upregulation of REST that mediates the splicing of NF1 into a less active isoform and leads to activation of the RAS/MAPK pathway, which reduces the survival of glioblastoma patients21. Processes of the splicing regulation may provide targets for cancer therapy, such as targeted splicing factors, antisense oligonucleotide targeted Splicing Factor-RNA interactions, and activation of the Nonsense-Mediated mRNA Decay (NMD) mechanism22,23,24,25,26. Therefore, investigating genes subjected to aberrant alternative splicing as GBM prognostic markers is a promising research direction.In this study, we integrated alternative splicing data and gene expression data to identify genes that are regulated by abnormal splicing in GBM progression with prognostic value, and constructed a prognostic risk model. The model was used to predict the prognosis of GBM patients, the tumour microenvironment differences and potentially effective anti-cancer drugs. FN1, used for the risk model construction, which contained 4 abnormal ASEs in the GBM and LGG comparison, and one ASE is a known cancer-associated splicing event, which was reported in other tumours27,28,29,30. Finally, we focused on FN1, and the comprehensive analysis showed this gene may be a potential splicing biomarker for GBM. The flow chart of this study is shown in Fig. 1.Fig. 1Workflow of this study.Full size imageMethodsData collection and pre-processingRNA-seq data collectionWe used the "TCGAbiolinks" package to obtain RNA-seq data and corresponding clinical information for TCGA-LGG and TCGA-GBM samples from The Cancer Genome Atlas database (TCGA, https://portal.gdc.cancer.gov/)31. A total of 546 LGG samples and 168 GBM samples were obtained. The original CGGA-693 cohort with 249 GBM samples and CGGA-325 cohort with 139 GBM samples downloaded from the Chinese Glioma Genome Atlas (CGGA, http://www.cgga.org), were used to validate model performance32.Microarray data collectionTo further validate the performance and generalizability of the ASRS model, microarray datasets of GSE4412 and GSE43378 were downloaded from the GEO database, comprising a total of 84 and 49 GBM samples, respectively.Protein array data collectionThe GBM and LGG reverse phase protein array (RPPA) dataset were downloaded from TCGA. RPPA data was available on 232 GBM samples and 429 LGG samples.Alternative splicing events data acquisitionThe ASEs data were obtained from ASCancer Atlas (https://ngdc.cncb.ac.cn/ascancer/home). The GBM dataset comprised 69,324 ASEs from 164 samples, while the LGG dataset included 60,213 ASEs from 515 samples33. The dataset contained information on various aspects of each ASE, such as the type of ASE, exon loci, gene symbols, and the Percent Spliced In (PSI, Ψ) value. These two datasets contained five major types of AS, including exon skipping (ES), alternative 5' splice site (A5SS), alternative 3' splice site (A3SS), intron retention (IR), and mutually exclusive exons (MEX or MXE, ME).Splicing factor data collectionWe integrated the splicing factors from the literature, the SpliceAid2 database (www.introni.it/spliceaid.html), and the results of retrieving Gene Ontology (http://www.geneontology.org) with the keyword of "mRNA splicing", then we identified 529 genes with mRNA splicing function or associated with splicing regulation as the splicing factor gene list dataset in this study (Table S1)34,35,36,37,38.Data pre-processingAll cases with TCGA and CGGA data that meet the following criteria included: 1. An available histological diagnosis of GBM; 2. Patients with available ASCancer Atlas data; 3. Patients with follow-up information and survival status more than 30 days after the initial diagnosis. A total of 156 CGGA-693 samples and 127 CGGA-325 samples were subsequently used to validate model performance. The TCGA-GBM gene expression data, clinical information and ASEs data from ASCancer Atlas contained 150 common samples across these datasets, and 515 samples for TCGA-LGG. To ensure data quality, ASEs of GBM with over 80% of sample "nulls" and PSI standard deviation less than 0.01 were removed. A total of 52,370 ASEs of GBM as clean data for prognostically critical genes analysis. The original GBM and LGG ASEs data were intersected to obtained 36,923 common ASEs for differentially alternative splicing events analysis. RPPA data and mRNA expression data from GBM were extracted according to sample ID, and a total of 72 samples were used for analyzing the correlation between mRNA and protein expression. All primary tumour samples from GBM and LGG RPPA data were employed for a comparative analysis of fibronectin expression.Analysis of differentially expressed splicing factorsThe "TCGAbiolinks" package was used to perform differential expression analysis on GBM and LGG datasets. Differentially expressed genes (DEGs) were filtered by |log2FC|> 1 (FC, Fold Change) and adjusted P-value