Integrated in vivo combinatorial functional genomics and spatial transcriptomics of tumours to decode genotype-to-phenotype relationships

Wait 5 sec.

MainCancer, like many other complex diseases, is caused by a combination of multiple genetic alterations1,2. Transitioning from portraying these genetic changes to understanding their phenotypic consequences by comparing human samples is, however, constrained by environmental influences, genetic diversity between patients, pervasive epistasis and the complexity of multicellular tissue structure3,4,5. Consequently, it remains poorly understood how combinations of alterations reprogramme cells and their interactions with the tissue environment.Genetic screens conducted in model systems have proven valuable for decoding genotype–phenotype relationships in controlled settings6,7 However, the presently available approaches for cancer-relevant in vivo functional genomics mostly investigate the effect of singular or pairwise alterations on proliferation and tumorigenesis without considering spatial niches in which cancer cells competitively develop and grow8,9,10,11,12. Although emerging studies are beginning to explore the impact of individual tumour alterations on their immunological microenvironments13,14,15, gaps remain in our understanding of how genetic changes jointly rewire tumour cells and their surrounding cellular ecosystems through epistasis. Overcoming this challenge calls for experimental approaches and modelling strategies that leverage tissue-level analyses and can effectively scale to handle higher-order combinations3. For instance, testing all combinations of four alterations in standard rodent models with four mice per group requires 64 animals. This need escalates rapidly: six perturbations would demand 256 and eight perturbations would require >1,000 animals.Here, to meet this challenge, we introduce a scalable experimental framework designed to facilitate the functional exploration of complex genotype–phenotype associations at the tissue level, which we termed charting higher-order combinations leveraging analysis of tissue to investigate genotype-to-phenotype relationships (CHOCOLAT-G2P; hereafter referred to as C-G2P). C-G2P is based on a mouse model of autochthonous tumour development, where combinations of barcoded perturbation plasmids randomly integrate into cells within their native environment (Random Unique Barcode Integration Combinatorics, RUBIX). RUBIX thus generates mosaics of genetically heterogeneous tumour clones in a single tissue. To streamline C-G2P spatially resolved in vivo functional genomics, we further developed Perturbation Barcode Capture Spatial Transcriptomics (PERTURB-CAST), a method that seamlessly integrates perturbation mapping with the standardized and commercially available 10X Visium spatial transcriptomics platform. We applied C-G2P to investigate phenotypic effects of eight combinatorial perturbations that induce liver tumours sampled from 256 possible genotypes.ResultsA framework to spatially map engineered tumour heterogeneityHuman tumours frequently present combinations of genetic alterations. With the aim to link complex genetic alterations prevalent in cancer to critical disease phenotypes within tumour ecosystems, we developed C-G2P. C-G2P allows induction and mapping of combinatorial perturbations in murine tissue and simultaneous characterization of the resulting neoplastic phenotypes on the same sample using a single spatial transcriptomics readout platform. Therefore, C-G2P merges and advances available technologies, including multiplexed perturbation in vivo functional genomics, molecular barcoding and spatial omics5,7,9,14,16 (Fig. 1).Fig. 1: A framework to spatially map engineered tumour heterogeneity.a, The C-G2P framework. Mice are HDTV injected with a pool of barcoded perturbation plasmids leading to sleeping beauty (SB)-transposon-mediated stable integration into the genome of hepatocytes. Higher-order combinatorial perturbations drive mosaic liver tumour development in a conceptual 2n combination space for clonal selection (RUBIX). Direct barcode identification is achieved by linking perturbations to 50-nt barcode sequences that are captured and identified by RTL probes as embedded in the 10X Visium for FFPE platform (PERTURB-CAST). Endogenous transcripts are captured alongside barcodes, hence enabling simultaneous mapping of genotypes (as defined by the presence of perturbations) and phenotypes (as defined by transcriptional signatures) on the same tissue section. b, PERTURB-CAST barcode selection. Transcripts not expressed in murine liver are identified using public databases. Their respective 50-nt RTL-probe capture sequences are used as barcodes detected by redeployed commercially available RTL probes provided with the 10X Visium for FFPE mouse kit (Methods). Barcodes derived from chemosensory receptor transcripts are embedded in perturbation plasmids as triplet arrays.Full size imagePrevious in vivo approaches to spatially map perturbations relevant to cancer employed ex vivo-manipulated cells that were subsequently injected into animals14,15,17. For C-G2P, we aimed to leverage an in vivo setting that more closely resembles tumour heterogeneity and tumorigenesis by direct genetic modification of cells embedded in their native tissue environment7. We therefore modified an autochthonous murine mosaic liver cancer model18,19,20 to allow for the creation of coexisting genetically diverse tumours. This approach (RUBIX) relies on hydrodynamic-tail-vein (HDTV) injection of pooled molecular-barcoded plasmids and sleeping beauty transposon-based methods to stably integrate traceable higher-order combinations of genetic alterations in hepatocytes within their tissue context. Consequently, RUBIX offers the possibility to generate mosaics of genetically heterogeneous tumour clones in a single tissue (Fig. 1a(top) and Methods).Presently available approaches to spatially map perturbations within tissue engage custom protocols and orthogonal readouts to also obtain transcriptomics-based phenotypic profiles, such as sequential antibody-based barcode detection and 10X Visium spatial transcriptomics on an additional tissue sample14. We developed PERTURB-CAST to address the constraints of existing methods and streamline the identification of perturbations and comprehensive tissue-level phenotypic information. PERTURB-CAST leverages spatial transcriptomics based on targeted transcript capture via RNA-templated ligation (RTL) probes21 that are commercially available with 10X Visium for formalin-fixed paraffin-embedded (FFPE) samples. To detect the introduced perturbations, PERTURB-CAST engages perturbation plasmids extended with 50-nucleotide (nt) barcodes amenable to RTL-probe capture (Fig. 1a(bottom)). Importantly, to ensure immediate compatibility with default commercial kits and circumvent modifications to the standard protocol, we redeployed 10X Visium RTL probes capturing chemosensory receptor transcripts that are not expressed in mouse liver for barcode identification. Specifically, we exploited 50-nt RTL-probe capture sequences of olfactory-, taste- and vomeronasal-receptor transcripts as molecular barcodes (Fig. 1b, Supplementary Fig. 1 and Methods). To achieve robustness, triplet barcode arrays were included in each perturbation construct to enable detection by three individual RTL probes that are pre-existing components of 10X Visium spatial transcriptomics kits (Fig. 1b, Extended Data Fig. 1, Supplementary Fig. 1 and Methods).RUBIX generates higher-order combinatorial perturbationsRelated to the genetic complexity in human liver cancer, we found that approximately 30% of tumours simultaneously present seven established cancer-driving alterations (including gain- and loss-of-function mutations as well as somatic copy number alterations such as amplifications and heterozygous losses; Fig. 2a and Supplementary Fig. 2) that can reveal varying combinatorial patterns in individual tumour samples (Fig. 2b)1. For C-G2P proof-of-concept, we therefore modelled complex cancer genetics by not solely focusing on loss-of-function mutations and multiplex CRISPR knockouts that may induce chromosomal rearrangements and cellular toxicity due to multiple double-stranded breaks8. We instead concentrated on combinations of alterations associated with liver cancer, including overexpression of oncogenic drivers (Myc, mutant Ctnnb1 (mtCtnnb1), Vegfa and NICD) and silencing of tumour suppressors (Trp53, Pten and Kmt2c) with short hairpin RNA (shRNA) alongside a frequently used Renilla luciferase (shRen)-targeting control construct13,22. We used RUBIX with a mix of eight barcoded perturbation plasmids for HTDV injection to generate a spectrum of combinatorial alterations relevant to liver cancer (Fig. 2b,c) and waited until tumours were palpable. Defined by combinations of these eight perturbations, we consequently anticipated testing 28 = 256 possible cancer-driving genotypes in a single experiment (Fig. 1a). In our pilot experiments, we distributed a total of 38 redeployed barcodes amenable to RTL-probe capture across eight perturbation plasmids (including variations in position and promotors) and further included complementary barcodes (for example, peptides)14,23 for orthogonal readouts (Extended Data Fig. 1 and Methods).Fig. 2: Modelling tumour heterogeneity.a, Frequency of liver tumour samples with at least k potential driver mutations per sample in The Cancer Genome Atlas Program (TCGA) HCC dataset. Potential drivers were defined as either amplification or fusion of known COSMIC oncogenes, or homozygous deletion, nonsense mutation, splice site mutation or frameshift deletion/insertion in tumour-suppressor genes. b, Frequent alterations observed in human liver cancer (The Cancer Genome Atlas Program HCC dataset) are ‘geno-copied’ in a C-G2P mouse model (oncoprint based on https://www.cbioportal.org/study/summary?id=lihc_tcga). ORF, open reading frame; RNAi, RNA interference. c, RUBIX mouse model generated in this study. Schematic overview of sleeping beauty transposon perturbation plasmids to ectopically overexpress genes of interest (oncogenic-driver perturbations) or shRNA to enable gene knockdown (tumour-suppressor perturbations). Functional elements are highlighted. BC, barcode in which three redeployed RTL-probe capture sequences (as indicated) are embedded; EF1, polymerase II promoter; IR, inverted/direct repeats of sleeping beauty transposon; pA: polyadenylation signal; sh, shRNA embedded in miRE context. Note that we used Visium mouse transcriptome probe set v1 to derive barcodes. Each 50-nt barcode is separated and flanked by spacer sequences of approximately 20 nt to avoid potential steric hindrance during hybridization. Further information in Methods.Full size imagePERTURB-CAST hijacks probe-based transcript capture for barcode mappingC-G2P liver samples were collected ten weeks after HTDV injection and processed (FFPE; Extended Data Fig. 2a). For spatial transcriptomics, six topographically separated regions of interest were selected based on histopathological assessment of haematoxylin and eosin (H&E)-stained sections (Fig. 3a and Extended Data Fig. 2a–e). Both 10X Visium and 10X CytAssist were conducted for a total of 12 samples covering these six regions, including serial sections as technical replicates (Extended Data Fig. 2f–h). A total of 324 tumour nodules were identified across the six segregated sections (Fig. 3a). Notably, spatial transcriptomics helped distinguish overlapping nodules that seemed to be single lesions from the histopathological perspective (Supplementary Fig. 3).Fig. 3: PERTURB-CAST spatially resolves multiplexed genetic perturbations in hundreds of coexisting cancer clones.a, RUBIX establishes hundreds of coexisting tumours in the context of native tissue. Respective H&E-stained tissue samples for six topographically separated regions (approximately 6 × 6 mm) that were used for 10X Visium for FFPE-spatial transcriptomics analysis. A total of 324 nodules (colour-coded and numbered) were annotated. Colours were chosen arbitrarily. b, PERTUB-CAST allows perturbation-specific barcode identification. Average log(1p)-transformed expression of all 38 barcode-associated transcripts used (left). Combined data for the reference control liver datasets from ref. 21 and the six main spatial transcriptomics samples (C-G2P). Spatially resolved expression of triplet barcodes (as indicated in Fig. 2c) for each of the eight perturbations (top right). Aggregated log(1p)-transformed and quantile-rescaled expression per 10X Visium spot. A representative sample is shown. Average log(1p)-transformed expression of individual barcodes in each triplet array for each perturbation averaged across the six spatial transcriptomics samples (bottom right). c, Conversion of PERTURB-CAST barcode signals to perturbation maps. Spatially resolved visualization of the inferred probabilities indicating the presence or absence of each of the eight perturbations associated with annotated tumour nodules (Methods). A representative sample is displayed. b,c, Both the quantitative barcode expression (b) and probabilities (c) of all samples can be explored through the interactive web browser https://chocolat-g2p.dkfz.de/. d, Validation of inferred perturbation integration. A GLM model predicts the phenotype expression signals based on the estimated probabilities of perturbation presence using Bayesian modelling (Methods). Phenotypes are defined as direct target transcripts associated with perturbations such as shPten–Pten and NICD–Notch1. Expression data were log(1p)-transformed. GS, a well-established marker for active WNT signalling in murine livers, was used to infer mtCtnnb1-GS-positive phenotype via IHC on a corresponding serial section. Baseline depicts background phenotype marker expression. Data are presented as feature coefficients shown as mean and error bars depict 3σ confidence intervals (CIs). Data are derived from 324 nodules across six topographically separated regions used for 10X Visium from a single RUBIX experiment with two animals. Mapping GS IHC data are derived from three corresponding sections from a single RUBIX experiment with two animals. A representative sample section is displayed.Full size imageAssessing the feasibility of PERTURB-CAST and our barcoding strategy, we observed that most barcode signals were readily detected in the C-G2P liver samples. Strikingly, barcode signals were spatially confined and closely tracking the areas of microscopic tumour nodules, as revealed by H&E staining (Fig. 3b). In contrast, we observed that none of the 38 redeployed barcode sequences were detected in publicly available murine liver 10X Visium datasets24 (Fig. 3b(left)). Reassuringly, with few exceptions (for example, Olfr1033 and Olfr1358), chemosensory receptor transcripts that provided the repertoire for barcode redeployment (n = 1,216) were generally not detected by 10X Visium in murine livers (Extended Data Fig. 3a,b). Overall, 5/38 redeployed barcodes had insufficient signal across all samples investigated (log(1p)-transformed average expression,  E) that observed occurrences (O) deviate from the expected baseline distribution (E) (Methods; bottom). Deviations of >0.5 indicate increased tumorigenic potential (orange), whereas values of  1 suggests co-occurrence, whereas OR  E) = 0.87; solid line and arrow in Fig. 4a and b(top), respectively), which suggests a strong association of this specific compound genotype with tumorigenesis. Interestingly, although septets seemed to be generally prevalent, the specific septet devoid of VEGFA (dashed line and arrow in Fig. 4a and b(top), respectively) demonstrated enrichment (n = 13, p(O > E) = 0.90) comparable to the complete octet (n = 12, p(O > E) = 0.81), whereas septets lacking mtCtnnb1 or Myc, for example, were less enriched. This in turn suggested a diminished cancer-driving effect of VEGFA in the setting of the combinatorial alterations tested.Cancer-driving co-dependencies and potential context dependenciesTo pinpoint which perturbations contributed to the observed patterns of enriched and depleted genotypes, we conducted co-occurrence odds ratio (OR) analysis. This analysis measures second-order epistatic interactions, which quantifies the deviations from purely additive effects in a commonly used multiplicative model6,26 (Fig. 4e,f and Methods). Our results revealed co-dependency patterns for Myc, mtCtnnb1 and NICD as well as shTrp53 and shPten, aligning with previous observations18,19,20. The combination of Myc and NICD exhibited the most pronounced effect (OR = 1.64; P = 0.067). In contrast, we observed a tendency towards mutual exclusivity between VEGFA and NICD (OR = 0.68; P = 0.036) and between VEGFA and mtCtnnb1 (OR = 0.67; P = 0.064; Fig. 4e,f).Together, these observations indicate a context-dependent oncogenic effect of VEGFA.Phenotypic landscapes of engineered tumour heterogeneityTo elucidate spatial phenotypes and enable subsequent genotype-to-phenotype analyses (Fig. 5), we leveraged tissue-wide transcriptional signatures and defined sets of transcripts that characterized prevalent cell states (Supplementary Figs. 4 and 5 and Methods). To finally map the complexity of tumour ecosystems, we visualized phenotype-associated transcriptional signatures within their spatial context (Fig. 5a and Supplementary Figs. 6–18). Furthermore, to highlight associations for nodule-intrinsic phenotypes as well as those related to the tumour microenvironment (TME), we used transcript-resolved heatmap presentations (Fig. 5b–d). Thereby, C-G2P allowed us to chart the heterogeneous phenotypic landscape of hundreds of coexisting genotypically defined tumours and their surrounding tissue environment (interactive maps at: https://chocolat-g2p.dkfz.de/).Fig. 5: C-G2P maps tumour ecosystems comprising hundreds of coexisting cancer clones.a, The tumour ecosystem. Spatial maps of tumour-intrinsic phenotypes (top) and TME phenotypes (bottom) across six topographically separated regions. Colour shade depicts aggregated log(1p)-transformed expression of phenotype-associated transcripts (colour-code as in b and d). Nodule borders are highlighted (grey). The aggregated values for all samples and underlying quantitative data of individual transcript expression can be explored through the interactive web browser interface (https://chocolat-g2p.dkfz.de/). b, Co-clustering of tumour-intrinsic phenotypes by associated transcripts. Tumour phenotypes are colour-coded. c, Associations between tumour-intrinsic and TME phenotypes. Pearson’s correlation coefficient for each pair of tumour-intrinsic and TME phenotype-associated transcripts across all nodules. d, Co-clustering of TME phenotypes by associated transcripts. TME phenotypes are colour-coded. b,d, Clustering based on Spearman correlations. Phenotypes are subdivided using hierarchical clustering. Scaled (p10) estimated plasmid probabilities per nodule are indicated (b(bottom) and d(left) (Methods). Data are derived from 324 nodules across six topographically separated regions used for 10X Visium from a single RUBIX experiment with two animals.Full size imageStratification of coexisting liver tumour subtypesOur approach readily distinguished prominent subtypes of liver tumours (Fig. 5a(top)). First, we observed nodules with cholangiocyte-like transcriptional signatures (for example, Krt19+Cldn7+) indicative of cholangiocarcinoma (CCA)27,28 (Fig. 5a,b and Supplementary Fig. 6). Microscopy inspection indeed classified these tumour nodules as CCAs exhibiting a glandular growth pattern and stroma deposition as well as CK19-protein expression by IHC (Supplementary Fig. 7). Cholangiocarcinoma is the second-most common type of liver cancer following hepatocellular carcinoma (HCC) and both tumour types can develop from hepatocytes in the HDTV injection-based mouse model27,28. Interestingly, C-G2P pinpointed, among others, expression of solute carrier family 15 member 2 (Slc15a2) for which genomic variants have been linked to sorafenib-therapy response29 as well as pancreatic glycoprotein 2 (Gp2) as being associated with CCA (Fig. 5b and Supplementary Fig. 6). The latter observation harmonizes with earlier research suggesting that anti-GP2 IgA autoantibodies enable early CCA detection in subsets of human patients30, hence indicating that our C-G2P approach captures key elements of CCA biology that have so far not been observed in animal models.Moreover, given the spatial resolution of C-G2P, we could immediately relate the prominent second and third cluster of tumour nodules to metabolic liver zonation. Spatial division of metabolic functions is not only central to liver-tissue organization under physiological conditions31,32 but has been proposed to enable molecular classification of human HCC33,34,35. In alignment with this zonation-based molecular classification, C-G2P enabled us to stratify nodules either as portal-like (for example, Sds+Sdsl+) or central-like (for example, Cyp2e1+Oat+)31,32, the latter being the most abundant tumour class observed (Fig. 5a,b and Supplementary Figs. 8 and 9). Interestingly, recent findings from zonation fate-mapping animal models suggest liver cancer prevention strategies that leverage central-zonation-dependent mechanisms, particularly targeting Gstm3, which we also identified as a central-like tumour marker (Fig. 5b)36.Last, focusing on the tumours that could not readily be assigned to the aforementioned subtypes, we identified a fraction of nodules that revealed enrichment of hepcidin antimicrobial peptide (Hamp), Hamp2 and uridine phosphorylase 2 (Upp2) expression (Hamp2+Upp2+; Fig. 5a and Supplementary Fig. 10). Hamp and its paralogue Hamp2 have both been associated with midlobular zonation32, a feature of liver structure important for regeneration37. Upp2, on the other hand, is involved in pyrimidine salvage, which fuels glycolysis and enables growth of cancer cells under nutrient-limited conditions38,39.We further identified subgroups of nodules sharing cholangiocytic as well as portal-like features (Fig. 5b), an observation in agreement with a proposed hybrid periportal hepatocyte cell type40. Similarly, subsets of nodules from the major classes, with the exception of central-like nodules, shared striking enrichment of numerous histone-associated transcripts (Fig. 5b and Supplementary Fig. 11). Upregulation of genes encoding histone proteins is described as the most prominent gene regulatory programme at the G1–S phase transition in human pluripotent cells41.Tumour–stroma and tumour–immune cell connectionsNext, by focusing on cellular ecosystems of the liver TME (Fig. 5a(bottom) and Fig. 5d), we identified prominent fibroblast-associated transcriptional signatures (for example, Col1a1+Col3a1+)42,43 at the tumour–stroma border. We further observed regionally segregated expression patterns associated with haematopoietic/immune cell clusters (Fig. 5a). These included signatures likely to be associated with erythroblasts (for example, Hbb-bt+Slc4a1+)44,45, platelets (for example, Pf4+Itga2+)46, mast cells (for example, Cpa3+Cma+)47,48, B cells (Jchain+Igkc+)49 and neutrophil subpopulations (for example, Elane+Mpo+ and Ngp+Camp+)45,50,51. Signatures associated with Kupffer cells/macrophages (for example, Marco+Clec4f+ and Csf1r+C1qa+)3,24 were primarily detected within the non-tumour compartment (Supplementary Figs. 12–18).Our approach immediately revealed connections between tumour-intrinsic cell states and the microenvironment, such as a notable link between CCA and fibroblast-like signatures (Fig. 5c). This observation aligns with human data indicating that cancer-associated fibroblasts are the major cellular component of CCA-associated desmoplastic stroma43. Our approach indeed grouped fibroblast-like signatures alongside growth arrest-specific 6 (Gas6) and thrombospondin 1 (Thbs1), both of which were previously identified as marker transcripts for a mechanoresponsive cancer-associated fibroblast subpopulation42 (Fig. 5d and Supplementary Fig. 12). Our results further pointed towards additional associations such as a link between CCA and macrophages (for example, Csf1+C1q+) as well as a connection between enriched erythroblast (Hbb-b+Slc4a1+) occurrence and the histone-associated subgroup of nodules (Fig. 5c).Complex genotype–phenotype relationshipsUsing spatial maps that combine phenotypic and genotypic data from the same tissue sections enables detailed investigation of phenotype–genotype relationships (Figs. 4a and 5a–d). We therefore assigned binary phenotype labels to nodules (Extended Data Fig. 9 and Methods) and calculated the OR values to assess the connection of each perturbation to specific tumour-intrinsic and microenvironmental phenotypic groups (Fig. 6a). Remarkably, in the setting of the combinatorial perturbations tested, our findings indicated that CCA reveal a strong positive association with VEGFA, exceeding any other observed linkage, as well as a strong negative association with mtCtnnb1 (Fig. 6a). Furthermore, consistent with the central role of WNT signalling in liver zonation52, portal-like tumours revealed negative associations, whereas central-like nodules revealed positive associations with mtCtnnb1 (Fig. 6a). Notably, these genotype–phenotype observations align well with the aforementioned zonation-based classification of human HCCs and single-cell RNA-sequencing (scRNA-seq) data from human liver cancer, which revealed that the central-like HCC subtype is associated with Ctnnb1 mutations33,34,35, hence indicating that our C-G2P approach mirrors features of human HCC biology.Fig. 6: C-G2P decodes relationships between complex genotypes and tumour-intrinsic and microenvironmental phenotypes.a, Identification of genotype–phenotype relationships. Comparison of the prevalence of perturbations between phenotypic groups and the remainder of the nodules (total n = 324) for tumour-intrinsic phenotypes (top) and TME (bottom) using ORs. OR > 1 indicates enrichment of perturbations within the phenotypic group; OR