Construction and validation of gene signature for prognosis and drug sensitivity in cholangiocarcinoma based on cellular senescence related genes

Wait 5 sec.

IntroductionCholangiocarcinoma, a very deadly epithelial cell cancer, can develop anywhere along the biliary tree or inside the hepatic parenchyma1,2. About 15% of all primary liver tumors and 3% of gastrointestinal cancers are cholangiocarcinoma, which are becoming more common worldwide3. Because of its aggressive nature, refractoriness to treatment, and quiet presentation, cholangiocarcinoma has an alarming mortality rate, accounting for 2% of all cancer-related fatalities worldwide3,4,5. Surgery alone has been giving way to a multimodal approach that combines surgery, chemotherapy, and immunotherapy in the treatment of cholangiocarcinoma in recent years. However, the prognosis for cholangiocarcinoma has not significantly improved, with a 5-year survival rate of 7–20%6,7. Further research is necessary because there aren’t many trustworthy biomarkers that can be utilized to predict the prognosis and therapeutic benefits of cholangiocarcinoma.Cellular senescence (CS) is caused by oxidative stress, DNA damage and telomere damage-related factors, which causes the cell cycle to enter a “irreversible” growth-proliferation arrest characterized by abnormal cellular morphology and structural and functional alterations8. By encouraging tissue remodeling and repair under healthy settings, CS can stop aberrant cell proliferation and carcinogenesis. In oncogenesis of cholangiocarcinoma, CS was a major contributor to genetic and epigenetic heterogeneity9,10. Senescence-associated secretory phenotype, TP53 mutation, widespread oncogene activation, and immunosurveillance failure all contributed to the development and exacerbation of CS in cholangiocarcinoma. In the meantime, the oncogenesis and the aggressiveness of cholangiocarcinoma cells were further enhanced by these aberrant CS. Destructive CS induction or removal is thought to be a promising anti-cancer treatment11,12. By causing tumor cells to senesce in a variety of solid tumors, several widely used chemotherapeutic medications (doxorubicin and cisplatin) as well as additional pro-senescence therapeutic approaches (telomerase inhibition, therapeutic regulation of the cell cycle, and SASP reprogramming) may have anti-tumor effects13. However, more pre-clinical research was required to fully understand the negative effects of senescence induction in cancer therapy as well as the reciprocal effects of senescent cells and SASP in various phases of carcinogenesis12,13. In order to maximize the benefits of chemotherapy and immunotherapy interventions, it is imperative to comprehend the main CS-related indicators in the evolution of cholangiocarcinoma patients, appropriately classify the tumor CS-risk, and minimize side effects of therapy-induced senesce.Using the datasets from The Cancer Genome Atlas (TCGA), GSE89748, and GSE107943, we used machine learning to create a cellular senescence related signature (CSS) for cholangiocarcinoma. Additionally, we assessed the function of CSS in assessing the prognosis and therapeutic advantages of patients with cholangiocarcinoma.Materials and methodsDatasets sourcesA total of 37 cholangiocarcinoma cases of bulk RNA-seq data were extracted from the TCGA database (https://portal.gdc.cancer.gov/repository). The analysis included patients with cholangiocarcinoma who had all available follow-up data. Two Gene Expression Omnibus (GEO) datasets, namely GSE89748 (n = 71) and GSE107943 (n = 30), were utilized to confirm the prognostic significance of CSS in cholangiocarcinoma. We also investigated the predictive power of CSS in predicting the immunotherapy benefits using data from two immunotherapy datasets: the IMvigor210 dataset (n = 298), the GSE91061 dataset (n = 98) and single cell sequencing dataset GSE145281(n = 10). ComBat algorithm was used for batch effect correction using the “sva” package, followed by probe-to-gene mapping and z-score scaling to ensure comparability across datasets. CS-related genes were obtained from MSigDB (https://www.gsea-1msigdb.org/gsea/msigdb/index.jsp) and these genes involved in “REACTOME_CELLULAR_SENESCENCE” and “GOBP_CELLULAR_SENESCENCE” (Supplementary Table 1). These genes were submitted for protein-protein interaction (PPI) network analysis using STRING database v11.5 (http://string-db.org/). Differentially expressed genes (DEGs) between cancer and normal tissue samples were analyzed using the R package “limma”. The senescence related phenotype score was extracted from the VarElect database (https://varelect.genecards.org/about/).Development and evaluation of a prognostic CSSTo find possible predictive indicators among these DEGs, we then performed univariate cox analysis. To develop a stable prognostic CSS, these potential prognostic biomarkers were subjected to an integrative machine learning analysis process, including random survival forest (RSF), elastic network (Enet), Lasso, Ridge, stepwise Cox, CoxBoost, partial least squares regression for Cox (plsRcox), supervised principal components (SuperPC), generalized boosted regression modeling (GBM), and survival support vector machine (survival-SVM). We built the CSS using the following procedures, which were based on the methodology of the R scripts (https://github.com/Zaoqu-Liu/IRLS) of earlier research14,15: (1) To find prognostic biomarkers in the TCGA dataset, univariate Cox regression was used; (2) a total of 90 algorithm combinations were then fitted to the prediction model of the TCGA data set; (3) all algorithm combinations were run in GEO cohorts; and (4) the C-index was computed for each cohort. Key parameters of each method for model construction were shown in Supplementary methods. The “surv_cutpoint” function in the R package “survminer” was then used to calculate the optimal cutoff for cholangiocarcinoma patients, which were then separated into high and low CSS scores (risk scores). Next, we compiled all of the prognostic indicators for cholangiocarcinoma that have emerged (Supplementary Table 2). We then computed the C-index curves of prognostic markers and clinical features using the “rms” program. Both univariate and multivariate cox analyses were used to determine the risk factor for cholangiocarcinoma prognosis.Immune infiltration, drug sensitivity and gene set enrichment analysesEach cholangiocarcinoma case’s immunological abundance was assessed using immunedeconv, a R package that integrates seven cutting-edge algorithms (CIBERSORT, MCPcounter, QUANTISEQ, XCELL, CIBERSORT-ABS, TIMER, and EPIC)16. For immune cell infiltration analysis, we used the CIBERSORT algorithm with a significance threshold of p < 0.05 for identifying differences in immune cell abundances between high and low CSS score groups. Similarly, other immune infiltration algorithms (MCPcounter, QUANTISEQ, etc.) were applied, with p < 0.05 used as the cutoff for statistical significance. The TCGA cohort’s cholangiocarcinoma cases’ immunological and ESTIMATE scores were determined using the R package “estimate”17. To determine the gene set score associated with immune-related functions, cancer-related hallmarks, and tumor escape and surveillance score of cholangiocarcinoma cases, we first obtained the cancer-related hallmark gene sets from the MSigDB and then conducted single sample Gene Set Enrichment Analysis (ssGESA). We then used a variety of prediction scores to assess the usefulness of CSS in forecasting the advantages of immunotherapy. The microsatellite instability (MSI) score and tumor immune dysfunction and exclusion (TIDE) score of cholangiocarcinoma cases were derived from TIDE (http://tide.dfci.harvard.edu). We examined the tumor mutation burden (TMB) score of cholangiocarcinoma cases from the TCGA database. Mutant-allele tumor heterogeneity (MATH) was used to compute the intratumor heterogeneity score of cholangiocarcinoma cases18. We next used the “oncoPredict” R package to determine the IC50 of medications for cholangiocarcinoma cases based on information from Genomics of Drug Sensitivity in Cancer (https://www.cancerrxgene.org/).Single-cell RNA-seq analysisThe Seurat R package (version 4.0) was utilized for the analysis of single-cell RNA sequencing data19. Genes identified in fewer than three cells, as well as cells with less than 200 detected genes or a mitochondrial content exceeding 5%, were excluded from our analysis. To normalize the data, we applied the Scale Data function to focus on the top 2000 highly variable genes within each sample. Dimensionality reduction of the gene expression data was achieved through principal component analysis. For visualizing cell distribution in a two-dimensional space, T-SNE analysis was performed using the Seurat package. After data normalization and scaling, clustering was performed using the FindClusters function with a resolution parameter of 0.8. This resolution provided an optimal balance between cluster separation and biological interpretation. The IRS score for each cell was computed utilizing the AUCell function.Cell linesThe human intrahepatic biliary epithelial cells (HIBEpiC) obtained from the Type Culture Collection of the Chinese Academy of Sciences (China), and the CCA cell lines (RBE, HCCC9810, HUCCT1) obtained from the American Type Culture Collection (ATCC).Western blotTotal protein was extracted from the cells using RIPA lysis buffer and protease inhibitors. Equal amounts of total protein were separated using SDS-PAGE gels, transferred to PVDF membranes and then incubated with appropriate antibodies. The visualized bands were developed using the ultra-sensitive ECL chemiluminescence kit (NCM, Soochow, China). The following antibodies were used: anti-EZH2 (1:1000, ZRB1095; Sigma-Aldrich), anti-GAPDH (1:2000, #5174; Cell Signaling Technology) as a loading control.TransfectionThe EZH2 knockdown lentivirus (GeneChem, Shanghai, China) were transfected into RBE cells according to the manufacturer’s protocol. After transfection for 48 h, stable clones were screened using 2 µg/mL puromycin. The shRNA sequences: 5’- CCCAACATAGATGGACCAAAT − 3’.Detection of cytokines in cell supernatantSASP affects the release of various bioactive cytokines, including chemokines, interleukins, etc. The levels of IL-6, IL-8, CXCL1, CCL5 and VEGF in the cell supernatant were detected by ABplex Human 5-Plex Custom Panel (RK04334, Abclonal, Wuhan, China). The detection was achieved using ABplex-100 (Abclonal).Cell proliferation assaysEdu assay and colony formation assay were performed to test the role of EZH2 on RBE cell proliferation. Edu assay was performed using the Cy3 Edu imaging kit (APExBIO, USA) according to the manufacturer’s protocol. The percentage of Edu positive cells was calculated and compared. For colony formation assay, about 500 RBE cells were added to a 6-well plate and cultured for 2 weeks. Colonies were then stained with crystal violet and those larger than 40 μm in diameter were counted.Flow cytometryThe cell cycle and cell apoptosis were assessed by flow cytometry. For cell cycle analysis, RBE cells were collected, fixed with 70% pre-cooled ethanol and stored overnight at -20 °C. The DNA content of RBE cells was detected by flow cytometry after stained with propyridine iodide (PI).For apoptosis analysis, RBE cells were stained with Annexin V-FITC and PI solution, then incubated in a dark room at room temperature for 20 min. The percentage of apoptosis was detected by flow cytometry.Statistical analysisR software (version 4.2.1) was used to perform all statistical analyses. The difference between continuous variables was assessed by Wilcoxon rank-sum test or student T test. The correlations between two continuous variables were evaluated by Pearson’s correlation analysis. The two-sided log-rank test was used to test the difference in different Kaplan-Meier survival curve.ResultsThe different expressed cellular senescence related genes in cholangiocarcinomaA total of 9732 DEGs were identified by preprocessing and normalization in cholangiocarcinoma cases (Supplementary Fig. 1A). The PPI network analysis was performed to among 283 CS-related genes obtained from MSigDB, revealing that 112 CS-related genes showed very tight interactions among each other (Supplementary Fig. 1B). A total of 89 intersected genes in above two gene set were extracted as shown in Venn plots (Supplementary Fig. 1C).Machine learning developed a CSSTo develop a CSS, we applied our machine learning-based integrative process to these intersected genes. Using the LOOCV framework, we fitted 90 different types of prediction models to the TCGA cohort. We then computed each model’s C-index for all GEO cohorts (Fig. 1A). Figure 1A displayed the C-index for each of the 90 types of prediction models across all cohorts. The optimal CSS was thus defined as the Lasso method’s CSS with the highest average C-index of 0.89 (Fig. 1A). The CSS was created utilizing 6 genes based on Lasso regression analysis, and the algorithm for calculating the CSS score (risk score) of cholangiocarcinoma cases was used: risk score = 0.9136× EZH2exp + 0.1365 × RNF2 exp + 0.0698 × CCNE1exp + (-0.2412) × E2F1exp + (-0.1026) × MDM2exp + 0.0965 × CDK4exp. Based on the optimal cut-off value, we subsequently divided cholangiocarcinoma cases into groups with high and low CSS scores. Higher CSS score was associated with a poor overall survival rate in the TCGA, GSE89748 and GSE107943 datasets (Fig. 1B and D, all p