EHBMT, a method for visualizing tumor evolution, identifies a surge in gastric cancer with hybrid epithelial–mesenchymal phenotypes due to an inflammatory microenvironment

Wait 5 sec.

IntroductionMalignant tumorigenesis, also known as carcinogenesis, is a complex multistep biological process with high heterogeneity1. On the one hand, there are significant histological, transcriptomic and genomic differences among patients with cancer, known as interpatient heterogeneity. On the other hand, with the rise of single-cell sequencing technology, more studies have noticed that there is also a high degree of heterogeneity in the cell population within tumors, known as intrapatient heterogeneity2,3,4. These high interpatient and intratumor heterogeneities ultimately determine the clinical presentation, response to treatment and prognosis differences5. Therefore, personalized medicine is crucial in tumor treatment.Cancer origin and progression are spatial processes that involve the breakdown of normal tissue organization, invasion and metastasis6. Cancer evolution and development (Cancer Evo–Dev) is the process of observing the occurrence, progression and metastasis of cancer from a temporal perspective, essentially explaining the spatial and temporal evolution of heterogeneous cancer cell populations7. Tumors are essentially a complex dynamic ecological system where heterogeneity is not only a substrate for evolution, but can also promote, or even be a requirement for, continued tumor development and progression8. According to recent advances regarding Cancer Evo–Dev, gene mutations, chronic inflammation, uncontrolled proliferation, immune escape and the tumor microenvironment have been identified as driving factors for cancer. The updating and development of the Cancer Evo–Dev theory is crucial for providing new ideas, strategies and technologies for cancer prevention and treatment.Precision medicine requires a better understanding of Cancer Evo–Dev, as well as the heterogeneous population of tumor cells that spans space and time. Molecular typing has been widely used in personalized medicine for patients with cancer-related disease. The epithelial–mesenchymal transition (EMT) is essentially an evolutionarily conserved program of cellular plasticity that controls the state of cells along the epithelial–mesenchymal axis, conferring EMT plasticity to epithelial cells9,10,11. The EMT process endows tightly anchored epithelial cells with invasive ability by converting them into motile mesenchymal cells, and is therefore considered a crucial step in the early stage of cancer metastasis12. Accordingly, increasing studies have confirmed that the EMT program is important for tumorigenesis and cancer metastasis.EMT is not a binary process but contains a heterogeneous and dynamic disposition with different intermediaries (also known as hybrid epithelial–mesenchymal (E/M) state) or partial EMT meta-states13. The p-EMT state, also known as the hybrid epithelial–mesenchymal state (hybrid E/M state or hybrid state) or intermediary state, is located in the intermediate stage between epithelial and mesenchymal states14. In other words, EMT is not a simple binary process but actually contains different transition states, including the epithelial state, the mesenchymal state and diverse hybrid E/M states15,16,17.The EMT process has high heterogeneity and plasticity, and tumor heterogeneity should be partly attributed to the EMT plasticity of tumor cells18. In addition, EMT is an evolutionarily conserved cellular program that exists in both normal tissues and tumor tissues. In this case, we can distinguish the high heterogeneity of cancer by molecular typing according to their differences in EMT heterogeneity and plasticity. For example, Cristescu and colleagues have divided patients with gastric cancer (GC) into four subtypes, including microsatellitein stability (MSS)/TP53−, MSS/TP53+, microsatellite instability (MSI) and MSS/EMT19. Likewise, emerging single-cell RNA-sequencing (RNA-seq) studies have confirmed the dynamically configurable heterogeneity and plasticity of EMT20,21,22. In this study, we proposed an EMT heterogeneity-based molecular typing method (EHBMT), which can be applied to both normal and cancerous tissues to guide personalized medicine. We have demonstrated the feasibility of this method using stomach cancer as an example, and validated its reliability through single-cell analysis and multiplex immunohistochemistry (mIHC) experiments.Materials and methodsData acquisitionThe RNA-seq data and clinical information of the Cancer Genome Atlas (TCGA) stomach cancer cohort (TCGA_STAD) was downloaded from the cBiopPortal web server. The transcriptome data of the GSE62254 dataset was downloaded from the Gene Expression Omnibus (GEO) in the NCBI web server. The clinical information of the Asian Cancer Research Group (ACRG) cohort (GSE62254) was downloaded from the supplementary data of ref. 19. The transcriptome data of the normal stomach tissues from the GTEx cohort was obtained as we previously described23. The transcriptome data of the GSE122401 and the GSE179252 cohorts was obtained as we previously described24,25. For the GSE122401 cohort, the normalized gene expression data using the relative standard error of the mean value in the GC cohort (GSE122401) was directly downloaded from the GEO dataset26. For the GSE179252 cohort, the gene expression data (count value) were normalized to the fragments per kilobase of exon model per million mapped fragments value.DEG analysisThe significant differentially expressed genes (DEGs) of each EMT subtype were identified using the DEseq2 R package. The differentially expressed protein-coding genes with P values less than 0.0001 are displayed in the volcano plot. Among them, the significantly upregulated protein-coding genes were collected for Gene Ontology (GO)/Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis. The top 100 significantly upregulated genes are displayed in the heat map (Fig. 6a–c).Molecular typing analysisOur previous review provided a comprehensive overview of the plasticity and heterogeneity of EMT signaling and proposed that EMT plasticity can lead to heterogeneity in GC18. On the basis of this, we propose a EMT heterogeneity-based molecular typing method. The unsupervised clustering analysis was performed based on the expression pattern of classic EMT-related biomarkers. The representative EMT signaling-related biomarkers, including epithelial signature genes (KRT18, KRT19, CDH1, EPCAM, F11R, EARP1, ESRP2, CLDN1, CLDN4, CLDN7, S100A14, PRSS8, PRSS22, ST14, ZNF165, C1ORF116, KDF1, CRB3, ELF3, HNF4A, OVOL1 and OVOL2) and mesenchymal signature genes (ZEB1, ZEB2, SNAI1, SNAI2, TWIST1, TWIST2, VIM, TGFB1. TGFB2. TGFB3, FN1, SPARC, COL1A1, COL1A2, MMP2 and TCF4) were based on our work and previous reviews. Clustering was performed on individual datasets and the samples were further classified into three subgroups according to the unsupervised cluster analysis using the ComplexHeatmap package. The clustering method was complete linkage, and the distance measurement method was calculated using the Pearson test. Samples were categorized into three subgroups (epithelial, hybrid epithelial–mesenchymal and mesenchymal) according to their distinct EMT signatures.The epithelial subgroup samples were labeled with ‘epi-high’ (or mesenchymal-low), the hybrid epithelial–mesenchymal subgroup samples were labeled with ‘epi-medium’ (or mesenchymal-medium) and the mesenchymal subgroup samples were labeled with ‘epi-low’ (or mesenchymal-high). The xCell deconvolution analysis can evaluate the epithelial score of per sample based on the bulk transcriptome data. The Estimation deconvolution analysis can evaluate the stromal score in a sample based on the bulk transcriptome data. To verify the classification accuracy of the EHBMT method, we performed a deconvolution analysis based on xCell and Estimate algorithms on the transcriptome of each cancer cohort to evaluate the degree of epithelial and stromal scores of all samples in each cohort. For example, we defined samples that were classified into the epithelial group by epi-high labeled by the xCell algorithm and mesenchymal-low labeled by the Estimate algorithm.Clinical GC samplesTissue samples (GC tissues and adjacent mucosa) were obtained from postoperative resection samples of patients with GC (n = 35) admitted to Taihe Hospital affiliated with Hubei Medical University. The study protocol was approved by the Human Research Ethics Committee of Hubei University of Medicine (no. 2024-TH-021). The procedures were in accordance with the Helsinki Declaration of 1975. Written informed consent was obtained from all patients. Tissue samples were immediately frozen in liquid nitrogen after resection and stored at −80 °C until use. All samples were pathologically confirmed.Single-cell analysisThe single-cell RNA-seq (GSE183904) dataset of GC samples containing 31 primary gastric tumor samples representing clinical stages I–IV and histological subtypes was available in the GEO repository27. We downloaded the original single-cell RNA-seq data and conducted re-analysis as previously described28. Detailed information of the single-cell analysis is given below. The R package Seurat (version 4.1.0) was selected for single-cell analysis. Briefly, once the quality control steps were completed, data normalization and scaling were conducted using the Seurat R package. The 10× data matrices were imported into Seurat V5.1 R package to perform data filtration, sample integration, gene normalization, dimension reduction and data visualization. Cells with feature counts (8,000) and a high percent of mitochondrial genes (>20%) were removed. Dimension reduction was done by the Seurat ‘RunPCA’ function. Then, Uniform Manifold Approximation and Projection for Dimension Reduction (UMAP) was used to visualize single-cell clusters by graph-based clustering the cells, employing the top 20 principle components with the largest variance (at a resolution of 0.2 for all the merged samples). The clustering of different cell types was performed by the corresponding classical biomarkers28. We characterized the identities of cell types of these groups based on the expression of known markers: CD3D (T cell), TNFRSF17 (plasma B), CDH1 (epithelial), CD163 (macrophage), PECAM1 (endothelial), FN1 (fibroblast), MS4A1 (B cell), KIT (mast). We collectively refer to endothelial cells and stromal cells as mesenchymal or stromal cells. The epithelial-to-mesenchymal ratio (E/M ratio) was used to reflect the progression of EMT in per sample. The E/M ratio was calculated as the number of epithelial cells divided by the sum of the numbers of epithelial and mesenchymal cells.mIHC assayThe mIHC assay was conducted using the mIF kit (Servicebio Technology) according to the manufacturer’s protocol. The four-labeling immunofluorescence utilizes tyrosine signal amplification (TSA) technology to simultaneously label four proteins on the same slice. Most of the steps for mIHC are the same as those for IHC. The minor difference is that during the mIHC experiment, TSA treatment and microwave elution are required after incubating the first and second antibodies. The TSA treatment causes fluorescently labeled tyrosine to covalently bind to the target protein tyrosine residues in the presence of HRP and hydrogen peroxide. Since the noncovalently bound first and second antibodies are broken down and washed away by microwave treatment, the covalently bound fluorescent tyrosine remains attached to the target protein. In this way, when detecting the second target, it is equivalent to a new round of labeling and there is no need to consider whether the second round of antibody will have a crossreaction with the first round of antibody. By repeating the immunolabeling process multiple times, different fluorescent tyrosine can be used to achieve dual or multiple fluorescent staining. The primary antibodies and corresponding TSA types used in this study were as follows: E-cadherin (GB12083, TSA-440, Servicebio), VIM (GB121308, TSA-546, Servicebio), Ki-67 (GB151499, TSA-488, Servicebio) and IL-1β (GB12115, TSA-647, Servicebio). Scores for staining intensity: 0, no staining; 1, weak staining; 2, medium staining; and 3, strong staining. Scores for the percentage of positive cells: 0, ~0–5% positive cells; 1, ~6–25% positive cells; 2, ~26–50% positive cells, 3, ~51–75% positive cells; and 4, ~76–100% positive cells. The protein expression level (IHC score) is equal to the product of the staining intensity score and the positive cell ratio score.Cell cultureThe normal gastric epithelial cell line GES-1 was purchased from the Shanghai Cell Bank. The cells were cultured in DMEM medium containing 10% FBS at 37 °C in 5% CO2. For IL-1β protein treatment, 2 ng/ml exogenous recombinant human IL-1β protein (HZ-1164, Proteintech) or negative control (PBS) was added in the medium for co-incubation for 72 h. Then, cells were collected for RNA-seq, immunofluorescence and RT–qPCR assays.Immunofluorescence assayThe GES-1 cells were seeded in 10-mm confocal dishes with a density of 1 × 105/well. The rabbit Vimentin antibody (22031-1-AP, Proteintech) and the rabbit E-cad antibody (20874-1-AP, Proteintech) were used as the primary antibodies and the Abflo 594-conjugated goat anti-rabbit IgG (AS039, Abcolonal) was used as the secondary antibody. The results were observed using a laser scanning confocal microscope (Olympus FV3000RS). The experiment was independently repeated three times.RNA-seq and RT–qPCRTotal RNA was extracted using Trizol reagent (Invitrogen) according to the manufacturer’s instructions. For RNA-seq, a total amount of 1.5 µg RNA per sample was used as input material for the RNA sample preparations. The whole process of library construction and sequencing was performed at Shanghai Lifegenes Technology Co. Ltd. For qPCR analysis, GAPDH was used as an internal control. Each gene was run in triplicate. Relative fold changes in gene expression were calculated using the comparative ΔΔCt method. The following primers were used: Vimentin-qF: GGGAGAAATTGCAGGAGGAG, Vimentin-qR: AGGTCAAGACGTGCCAGAGAC; E-cadherin-qF: GAAAGCGGCTGATACTGACC, E-cadherin-qR: CGTACATGTCAGCCGCTTC; TWIST2-qF: CAAGGCATGGCATGGCTTAC; TWIST2-qR: TGGGTGGACGTGTAGAGTCC; GAPDH-qF: TCACCAGGGCTGCTTTTA, GAPDH-qR: AAGGTCATCCCTGAGCTGAA; EpCAM-qF: CGCAGCTCAGGAAGAATGTG, EpCAM-qR: TGAAGTACACTGCCATTGACG; TGFB1-qF: AACCCACAACGAAATCTATG-3, TGFB1-qR CTTTTAACTTGAGCCTCAGC-3; NFKB2-qF: CCATGACAGCAAATCTCC, NFKB2-qR: TAAACTTCATCTCCACCCC; RELB-qF: CAATCCCAACCAGGATGTCT, RELB-qR: AGCCATGTCCCTTTTCCTCT.Statistical analysisFor gene expression analysis of different subtypes of GC, the P values were estimated using Mann–Whitney nonparametric test. Survival curves were calculated using the Kaplan–Meier method and differences between the curves were analyzed using the log-rank test. The gene expression correlation analysis was done using the Pearson test. The chi square test was used to analyze the significant differences in the proportion of different EMT subtypes in normal and cancer tissues, as well as the correlation between EMT subtypes and clinical characteristics of patients. For the other experiments, unpaired t-tests were used. All tests with P values less than 0.05 were considered statistically significant.ResultsThe overall framework and technical roadmap of this studyAs previously reported, the EMT process exhibits high heterogeneity and plasticity, which should be one of the reasons for tumor heterogeneity18. On the basis of this, we herein proposed a workflow for EHBMT by molecular typing GC tissues and/or normal gastric tissues according to their EMT heterogeneity using bulk transcriptome data (Fig. 1). Given that there are significant regional differences in the incidence and mortality rates of GC, this study included the bulk transcriptome data of the TCGA-STAD cohort (n = 373), the ACRG cohort (GSE62254, n = 300) and the normal gastric tissue cohort in the GTEx database (GTEx_stomach, n = 175). In addition, our study also included the bulk RNA-seq data of the GC cohort (GSE122401 and GSE179252 cohorts) containing paired cancerous and adjacent tissues to validate the feasibility of the EHBMT method. The EMT-related biomarkers, including classic epithelial biomarkers and mesenchymal biomarkers, were selected based on our previous literature review18. The EHBMT method was conducted based on the expression signature of these EMT-related biomarker genes in both normal and tumor tissues (Fig. 1). In general, the EHBMT method can divide tissue cohort into three subtypes: a epithelial phenotype cluster (EPC), a hybrid epithelial–mesenchymal phenotype cluster (HPC) and a mesenchymal phenotype cluster (MPC), based on their EMT signature features. Finally, we analyzed the molecular characteristics as well as the proportional changes of different EHBMT subclusters to visualize GC evolution. We further validated the feasibility of the EHBMT method using single-cell analysis and mIHC experiments.Fig. 1: Study design and overview.This study includes an unpaired GC cohort (cancer: GSE62254, TCGA_STAD; normal: GTEx_stomach) and a paired GC cohort (GSE122401, N = 80). After transcriptome data processing, the normalized expression of well-known epithelial and mesenchymal biomarkers was applied to evaluated the degree of the EMT progression of each sample. Meanwhile, according to the EMT heterogeneity differences, samples were clustered into different EMT subtypes using complete linkage method. Then, we analyzed the clinical features, prognostic status, molecular characteristics and proportion of different EMT subtypes. Finally, we validated the above molecular typing results using the mIHC method in our own paired GC cohort (N = 35).Full size imageMolecular classification of GC using EHBMTBased on the expression pattern of classic EMT-related biomarker genes, EHBMT was applied in patients with GC from two independent GC cohorts (GSE62254 and TCGA_STAD). According to Fig. 2a, b, EHBMT can divide patients with GC from the GSE62254 (ACRG) cohort or the TCGA_STAD cohort into three subtypes: EPC, HPC and MPC. The GSE62254 cohort contained 110 patients with the EPC subtype (36.7%), 117 patients with the HPC subtype (39%) and 73 patients with the MPC subtype (24.3%). The TCGA_STAD cohort includes 123 patients with the EPC subtype (33.0%), 177 patients with the HPC subtype (47.4%) and 73 patients with the MPC subtype (19.6%). The detailed clustering information per-GC sample is provided in Supplementary Tables 1 and 2. Next, we conducted survival analysis among patients with GC with different EHBMT subclusters. The prognostic analysis of the two independent GC cohorts both showed that patients with GC in EPC possessed the longest overall and disease-free survival time, while the MPC patient subtype had the shortest overall and disease-free survival time (Fig. 2c–f). In the GSE62254 cohort, the 5-year overall survival rates for EPC, HPC and MPC patient subtypes were 59.1%, 54.5% and 38.4%, respectively, and the 5-year disease-free survival rates were 60.6%, 56.1% and 29.0%, respectively. In the TCGA_STAD cohort, the 5-year overall survival rates for EPC, HPC and MPC patient subtypes were 47.1%, 37.3% and 18.0%, respectively, and the 5-year disease-free survival rates were 61.4%, 33.3%, and 0.0%, respectively. These results suggest that the progression of the EMT program is closely related to poor prognosis in GC.Fig. 2: Molecular typing analysis based on EMT heterogeneity in the two independent GC cohorts.a, b, The molecular typing studies were performed based on the expression signature of classic EMT biomarkers in the two independent GC cohorts (GSE62254 (a) and TCGA_STAD (b)). According to the molecular typing analysis, patients with GC from the GSE62254 cohort could be divided into three subtypes as follows: EPC (n = 110), HPC (n = 117) and MPC (n = 73). Patients with GC from the TCGA_STAD cohort could be divided into three subtypes as follow: EPC (n = 123), HPC (n = 177) and MPC (n = 73). c, d, Overall (c) and disease-free (d) survival analyses were performed in patients with GC with different EMT subtypes in the GSE62254 cohort. EMT progression was clinically associated with poor prognosis in this cohort. e, f, Overall (e) and disease-free (f) survival analyses were performed in patients with GC with different EMT subtypes in the TCGA_STAD cohort. EMT progression was clinically associated with poor prognosis in this cohort.Full size imageMolecular subclusters are associated with clinical characteristicsNext, we analyzed the clinical correlation between different EHBMT subclusters of patients with GC and their pathological characteristics in the GSE62254 and TCGA_STAD cohorts (Tables 1 and 2). Clinical analysis showed that EHBMT subclusters were significantly correlated with histological type, T stage and pathological stage of patients with GC from the two independent GC cohorts. Specifically, the EPC subtype tends to be intestinal GC with shallow infiltration and good prognosis, while the MPC subtype tends to be diffuse GC with deep infiltration and poor prognosis (Tables 1 and 2). In addition, in the GSE62254 cohort, patients in the MPC subtype were more prone to lymph node metastasis (P = 0.033) and lymphovascular invasion (P = 0.043). In the TCGA_STAD cohort, patients in the MPC subtype typically had poor differentiation (P = 0.000) and it was more likely to occur in Asians (P = 0.019). The clinical and prognostic analysis results suggest that EHBMT method is reasonable and acceptable as it can establish significant differences in the diagnosis and prognosis of GC.Table 1 The clinical correlation analysis between patient characteristics and different molecular subtypes in the GSE62254 cohort.Full size tableTable 2 The clinical correlation analysis between patient characteristics and different molecular subtypes in the TCGA_STAD cohort.Full size tableThe proportion of patients with the HPC subtype has surged in GCSince EMT program is highly conserved in both normal and tumor cells, we also performed EHBMT analysis in the normal gastric cancer cohort from the GTEx database (n = 175, Fig. 3a). As expected, EHBMT also divided the normal gastric tissues into three subtypes, including EPC (n = 105, 60%), HPC (n = 17, 9.7%), and MPC (n = 53, 30.3%). The detail clustering information of per normal gastric sample was provided in Supplementary Table 3. Comparing the proportion differences of different EHBMT subtypes in GC and normal gastric, we found that the proportion of patients with the HPC subtype increased sharply in GC (Fig. 3b). On the contrary, the proportion of patients with the EPC subtype was significantly decreased in GC (Fig. 3b).Fig. 3: The proportion of HPC subtype increased in GC.a Molecular typing studies based on the expression signature of classic EMT biomarkers in normal gastric tissues (GTEx cohort). Samples from the GTEx_stomach cohort could be divided into three subtypes: EPC (n = 105), HPC (n = 53) and MPC (n = 17). b The proportion analysis of each EMT subtype in both the normal gastric cohort (GTEx_stomach) and cancerous cohorts (GSE62254 and TCGA_STAD), using a chi-square test to determine significant differences. c Molecular typing studies based on the expression signature of classic EMT biomarkers in the GSE122401 cohort containing 80 paired cancerous and adjacent tissues. Samples from the GSE122401 cohort could be divided into three subtypes: EPC (n = 92), HPC (n = 40) and MPC (n = 28). d The EPC subtype contains 18 cancerous samples and 74 adjacent normal samples; the HPC subtype contains 39 cancerous samples and 1 adjacent normal sample; the MPC subtype contains 23 cancerous samples and 5 adjacent normal samples. e The proportion analysis of each EMT subtype in both normal and tumor tissues in the GSE122401 cohort. f The changes of EMT subtypes corresponding to paired adjacent and cancerous tissues from the same donor. **P