MainMetastasis, the process by which cancer cells spread from their site of origin to a secondary location, is the leading cause of cancer-related mortality2. Lung cancer accounts for the largest share of metastatic cancer cases3, owing to its high incidence4, frequent presentation as de novo metastatic disease5 and high rate of relapse after curative-intent surgery for localized disease6. Understanding the genetic basis for the mechanisms that enable cancer cells to migrate from the primary site into local tissues, air spaces, lymphatics and/or circulation, and to ‘seed’ foreign ‘soils’7, could inform strategies to prevent and treat this lethal condition1.The migration of cancer cells is a transient event that is difficult to observe in patients in real time; however, a retrospective view of this process can be gleaned by tracking the evolutionary history of subpopulations of cancer cells, or subclones, in primary tumours and metastases using DNA-sequencing data8,9,10,11,12. With this approach, studies of ovarian9, breast13,14 and prostate cancer15 have revealed complex metastatic migration patterns that involve multiple tumour subclones migrating bidirectionally between the primary tumour and metastases, and between anatomically distinct metastatic sites. The accuracy and completeness of this view of metastasis is contingent on the extent to which the sampled metastases are representative of a patient’s disease burden, and the availability of the primary tumour for comparison16. Despite its importance, such comprehensive and longitudinal sampling is rarely performed because of its clinical infeasibility in living patients, but it can be achieved by integrating research autopsy programmes16,17 with prospective clinical studies such as TRACERx (Tracking Non-small Cell Lung Cancer Evolution through Therapy (Rx); ClinicalTrials.gov identifier NCT01888601)18,19, which performs multi-region profiling of early-stage, operable primary NSCLC.To address this need, the national, multi-centre, pan-cancer research autopsy programme PEACE (Posthumous Evaluation of Advanced Cancer Environment; NCT03004755) was strategically embedded in centres that recruited patients to TRACERx to enable co-enrolment to both studies and the generation of a clinically annotated tumour tissue resource that spans the complete disease course, from diagnosis to death.In this study, we reconstructed detailed tumour evolutionary histories and metastatic migration patterns using high-depth whole-exome sequencing (WES) data from longitudinally collected primary tumour, pre-mortem and post-mortem metastasis samples from 24 patients enrolled in both TRACERx and PEACE (Supplementary Fig. 1), to investigate the genetic properties that endow cancer cells with the capacity to metastasize. By integrating migration patterns with serial radiological imaging performed before death, we uncover tumour-intrinsic, temporal and anatomical properties that govern NSCLC metastasis.The TRACERx–PEACE cohortThe clinical characteristics of the patients in this cohort, including age (median [range]: 70 [47–87] years), smoking status (19 ex-smokers, 3 current smokers and 2 never-smokers), disease-free survival (DFS; median [interquartile range; IQR]: 11 [6−18] months) and overall survival (OS; median [IQR]: 29 [16–46] months), were broadly representative of patients with comparable stages of operable NSCLC (TNM version 8: 4 stage I, 8 stage II and 12 stage III)19. Common NSCLC histological subtypes were represented (9 lung adenocarcinomas (LUAD) and 10 squamous cell carcinomas (LUSC)), and 5 other subtypes, including 2 large cell carcinomas, 2 pleomorphic carcinomas and 1 carcinosarcoma, were present (Supplementary Table 1).In total, 108 regions from 24 resected primary tumours (median regions per primary [range]: 4 [2–8]), 41 regions from 35 metastases sampled pre-mortem (12 lymph node metastases resected during primary surgery from 7 patients, 17 metastases sampled at relapse from 15 patients and 6 at disease progression from 4 patients) and 352 regions from 233 anatomically distinct metastases collected at autopsy were subjected to WES (median depth: 401.2, IQR: 360.2–441.5, Fig. 1a) and passed quality control (Supplementary Fig. 1 and Methods). At autopsy, metastatic sampling was guided by radiological imaging performed before death and macroscopic examination by the attending pathologist. For most metastases collected pre-mortem or at autopsy, a single metastasis region was sampled (75%, 200/268), but in 25% (68/268), multiple regions from the same metastasis were sampled. The mean number of metastasis regions collected per patient was 16 (range: 3–39) and the mean number of anatomically distinct metastases sampled per patient was 11 (range: 2–37). These encompassed 19 anatomical locations, including common metastasis sites observed in NSCLC20: lung (125 regions), lymph node (80 regions), liver (33 regions), musculoskeletal soft tissues (23 regions: 13 chest wall, 3 diaphragm, 2 abdominal wall, 5 other), brain (22 regions), adrenal gland (17 regions) and bone (7 regions) (Fig. 1a). For the 23 patients with available radiological imaging, quality-controlled WES data were available for 70% (112/160) of the metastases that were detected with imaging performed before death (Fig. 1b).Fig. 1: Clinical and sample characteristics of the TRACERx–PEACE cohort.The alternative text for this image may have been generated using AI.Full size imagea, Longitudinal timelines for 24 patients showing clinical events and sampling time points, ordered by overall survival. Tumour histology, stage and smoking status are annotated. The body map summarizes the total number of samples obtained per organ across the cohort. LN, lymph node. b, Number of metastases per patient that were imaged only (white), sampled only (peach) or both imaged and sampled (red) among 23 patients with radiological imaging available. c, Mean genetic diversity (mutations and SCNAs; Methods) per patient within primary tumours (intra-primary, dark blue; n = 23 patients, 100 regions), within multi-region sampled metastases (intra-metastasis, dark green; n = 21 patients, 191 regions), between anatomically distinct metastases (inter-metastasis, green; n = 24 patients, 258 metastases) and between primary and metastatic samples (primary–metastasis, blue; n = 24 patients, 24 primaries–258 metastases). Lines connect patients. Wilcoxon signed-rank test. The box plots show the median and IQR with whiskers denoting values within 1.5 times the IQR from the first and third quartiles. Body map illustration in a by J. Brock adapted from ref. 11 under a Creative Commons licence CC BY 4.0.Intra- and inter-metastasis genetic heterogeneityTo resolve the extent of metastatic heterogeneity and the degree to which metastases resemble the primary tumour from which they originate, we performed a detailed genomic analysis of somatic mutations, somatic copy-number alterations (SCNAs) and whole-genome doubling (WGD) in primary and metastasis regions, and reconstructed the subclonal architecture and phylogenetic history of each patient’s disease (Supplementary Fig. 2 and Methods). Subclones were classified into four groups to delineate when they arose relative to metastatic dissemination: truncal (the most recent common ancestor (MRCA) of all sequenced cancer cells); primary-unique (non-truncal subclones present in the primary tumour and undetected in any metastasis); metastasis-unique (non-truncal subclones present in one or more metastases and undetected in the primary tumour); and shared subclonal (non-truncal subclones present in both the primary and one or more metastases).We detected subclonal diversity within individual metastases (median [range]: 7 [3–37] subclones per metastasis) and between anatomically distinct metastases (79% of metastases contained a subclone that was not detected in any other metastasis). The number of subclones detected per metastasis increased with the number of metastasis regions sampled (Pearson’s R: 0.6, P = 7.64 × 10−27; Extended Data Fig. 1a), implying that, as with primary NSCLC tumours18,19,21, single-region sampling can underestimate the subclonal diversity in a metastasis. The number of metastasis-unique subclones identified increased with the number of anatomically distinct metastases sampled per patient (Pearson’s R: 0.52, P = 0.01; Extended Data Fig. 1a), highlighting that samples from anatomically distinct metastases are needed to detect the subclones variably distributed across metastatic sites. The number of metastasis-unique subclones identified per patient in this cohort (median [range]: 28 [4–58]) was 11-fold greater than it was in our previous analysis of metastases sampled at the time of primary surgery or relapse in 126 patients enrolled in TRACERx11 (median [range]: 2.5 [0–13]); the mean number of metastasis regions sampled per patient was 1.7 (range: 1–6) in that study, compared with 16 (range: 3–39) in this study (Supplementary Fig. 3).We used two metrics based on either somatic mutations or SCNAs to quantify the genetic diversity between anatomically distinct metastases (inter-metastasis heterogeneity), within individual metastases (intra-metastasis heterogeneity) and between the primary tumour and metastases from the same patient (primary–metastasis heterogeneity; Methods). Metastases from the same patient were more similar to each other than they were to the paired primary (mean inter-metastasis versus mean primary–metastasis heterogeneity per patient: mutation diversity P = 0.004, SCNA diversity P = 6.0 × 10−5, Wilcoxon signed-rank test; Fig. 1c), as observed in prostate cancer15. Primary–metastasis heterogeneity was positively associated with both intra-primary heterogeneity (mean per patient: mutation diversity Pearson’s R: 0.57, P = 0.005; SCNA diversity Pearson’s R: 0.51, P = 0.013; Extended Data Fig. 1b) and intra-metastasis heterogeneity (mean per patient: mutation diversity Pearson’s R: 0.55, P = 0.01; SCNA diversity Pearson’s R: 0.33, P = 0.15; Extended Data Fig. 1b), suggesting that somatic evolution in the primary tumour, metastases, or both after metastatic dissemination contributes to this genetic divergence. The degree of primary–metastasis divergence did not differ according to the treatment patients received (Supplementary Fig. 4), although such inferences might be limited owing to cohort size. Individual multi-region sampled metastases were less heterogeneous than their paired multi-region sampled primary (mean intra-metastasis versus mean intra-primary heterogeneity per patient: mutation diversity P = 0.033, SCNA diversity P = 0.004, Wilcoxon signed-rank test; Fig. 1c), but when all metastases sampled from a patient were considered, the heterogeneity among them was not significantly different to that within the paired primary tumour (mean inter-metastasis versus mean intra-primary heterogeneity per patient: mutation diversity P = 0.443, SCNA diversity P = 0.656, Wilcoxon signed-rank test; Fig. 1c).These data are consistent with primary tumours and metastases continuing to evolve after they diverge. As such, resected primary NSCLC tumours are unlikely to be representative of metastases detected during clinical follow up22, and, as in other cancer types9,15,23, the full extent of metastatic heterogeneity is likely to be underestimated when metastatic sampling is limited.Mutational processes change over timeSomatic mutations arise from a variety of mutational processes. To assess their contribution to temporal and spatial genomic heterogeneity, we evaluated mutational signatures known to be active in NSCLC across truncal, shared subclonal, primary-unique and metastasis-unique subclones (Methods). Smoking and APOBEC signatures were more prevalent in primary than in metastasis-unique subclones. This was evident when considering the majority aetiology of each subclone (the mutational process that constituted more than 50% of mutations; Extended Data Fig. 1c and Methods), or the number of subclones in which each mutational process was detected (average percentage of subclones with smoking signature: primary 21% versus metastasis 7%, P = 0.02; average percentage of subclones with APOBEC signature: primary 54% versus metastasis 34%, P = 0.0045; Wilcoxon signed-rank test; Extended Data Fig. 1d and Methods). In ever-smokers (patients who had smoked 100 or more cigarettes during their lifetime), 83.4% of all mutations attributed to the smoking signature SBS4 occurred in the trunk, consistent with SBS4 being an early process in NSCLC evolution21. The lower prevalence of SBS4 in metastases might also reflect a lack of exposure owing to smoking cessation (80% of patients were ex-smokers) or that not all organs are equally exposed24. APOBEC activity was highest in shared subclonal and primary-unique subclones (Extended Data Fig. 1c,d). Episodic APOBEC activity was observed throughout tumour evolution25, including in metastases, characterized by fluctuating levels of APOBEC mutagenesis along phylogenetic branches in 39% (9/23) of patients with APOBEC activity (Extended Data Fig. 1e).Among metastasis-unique subclones, the majority aetiologies, in order of prevalence, were clock-like (SBS1 and SBS5), APOBEC (SBS2 and SBS13), platinum (SBS31 and SBS35) and smoking (SBS4) (Extended Data Fig. 1c). Platinum-related signatures were detected in metastasis-unique subclones in 64% (9/14) of patients treated with platinum chemotherapy. None of these mutational processes were site-specific: each occurred in multiple metastatic sites (Extended Data Fig. 1f and Supplementary Fig. 5). Mutational signature profiles of metastasis-unique subclones were more similar to each other than they were to ancestral primary subclones (P = 0.0043, Wilcoxon signed-rank test; Extended Data Fig. 1g), suggesting that the observed temporal shifts in mutational process activity contribute to primary–metastasis divergence.Putative driver alterations occur in metastasesTo investigate genetic alterations that might underpin somatic evolution in metastases, we assessed the frequency of known genetic drivers of tumorigenesis. Annotating somatic mutations (Methods), 196 driver mutations were identified: 174 single-nucleotide variants (SNVs), 4 dinucleotide variants (DNVs), and 18 insertion–deletions (indels), of which 70% affected tumour suppressor genes (TSGs) (Fig. 2a). Overall, 49% (97/196) were truncal and thus shared between all primary and metastasis regions, consistent with studies in other cancer types23,26,27,28,29 (Fig. 2b). KRAS was the most frequently mutated oncogene (21% (5/24) of patients) and was always truncal. Of the driver mutations, 27% (53/196) were metastasis-unique (detected in metastases, but not in the primary tumour). Most patients (83%, 20/24) had at least one metastasis-unique driver mutation (median [range]: 2 [0–6] per patient; Fig. 2a). The number of metastasis-unique drivers per patient correlated with the total number of metastasis-unique mutations (Pearson’s R = 0.8, P = 3.2 × 10−6; Fig. 2c), and both were associated with the duration of chemotherapy treatment (Extended Data Fig. 2a). Metastasis-unique subclones with platinum signature activity were more likely to contain a metastasis-unique driver than were their counterparts without (P = 0.016, chi-squared test; Fig. 2d), suggesting that in addition to the greater number of drivers that result from treatment-related mutagenesis, treatment may select for or induce these mutations.Fig. 2: Evolutionary timing of putative driver alterations.The alternative text for this image may have been generated using AI.Full size imagea, Summary of the focal copy-number amplifications (Amp) affecting oncogenes (top), focal LOH affecting TSGs (middle) and putative driver mutations (Mut: SNVs, indels and DNVs; bottom) across patients. Only mutations occurring in at least two patients or showing biallelic inactivation (Methods) in one patient are shown. Alterations are classified as truncal (grey), shared subclonal (light blue), primary-unique (dark blue) or metastasis-unique (green). Mutations that co-occur with LOH are marked with circles. Horizontal bars indicate total occurrences per gene and class. b, Alteration class distribution for events in a. Clinically actionable events are highlighted in boxes; bold font denotes biomarkers for approved lung cancer therapies (OncoKB level 1) and regular font denotes biomarkers with compelling clinical evidence (OncoKB level 3A). n = 24 patients. c, Correlation between the number of metastasis-unique mutations and the number of metastasis-unique driver mutations per patient; points coloured by the relapse treatment status of each patient. d, Fraction of metastasis-unique subclones in platinum-treated patients (n = 14) with detectable platinum-associated mutational signatures, stratified by the presence or absence of a driver mutation. Chi-squared test.Next, we used OncoKB30,31 to determine whether actionable drivers occur uniquely in metastases (Fig. 2b). No level 1 actionable mutations, which predict response to a drug licensed for NSCLC management, were detected in metastasis-unique subclones. Only one metastasis-unique mutation was annotated as potentially actionable: SMARCA4 (p.E1083K), identified in a lung metastasis in patient CRUKP3358, was classified as a level 3A mutation (a biomarker for an unlicensed drug with efficacy in clinical trials32).We also examined the timing of focal amplifications and loss of heterozygosity (LOH) affecting cancer-related oncogenes and TSGs, respectively (Methods). Among TSG driver mutations, 46% (63/138) had biallelic inactivation, defined as a mutation affecting one allele with LOH affecting the other (Methods and Fig. 2a). In 76% (48/63) of these cases, both events were detected in the primary tumour; 21% (13/63) occurred sequentially, with the first hit in the primary and the second in a metastasis; and in 3% (2/63) of cases, both events occurred in a metastasis. For example, CRUKP2037 had a truncal driver mutation affecting ARID1A (N942S), followed by LOH of the ARID1A locus within a lymph node metastasis, as well as a truncal LOH affecting B2M, followed by a subclonal B2M driver mutation, both in the primary tumour (Extended Data Fig. 2b). The most frequently mutated TSG was TP53 (75% (18/24) of patients), which is associated with metastatic seeding11,33. It was biallelically inactivated in 89% (16/18) of patients: in most cases both hits were truncal (88% (14/16)), whereas in the others, subclonal LOH followed a truncal driver mutation, including on parallel phylogenetic branches (CRUKP8172).WGD, which is often associated with TP53 disruption34 and can mitigate against deleterious alterations35, occurred in 92% (22/24) of patients. These events occurred throughout tumour evolution (76% (31/41) occurred in the primary—11 truncal, 11 shared subclonal and 9 primary-unique—and 24% (10/41) occurred in metastases; Extended Data Fig. 2c), and were associated with increased primary–metastasis (Pearson’s R: 0.57, P = 0.0034) and inter-metastasis SCNA heterogeneity (Pearson’s R: 0.67, P = 0.00051; Extended Data Fig. 2d). Primary subclonal and metastasis-unique WGD events occurred on parallel phylogenetic branches in 29% (7/24) of patients, suggesting that they confer a fitness advantage in late-stage disease.Overall, although most driver alterations, including those that are clinically actionable, occurred in primary tumours, metastases accrued additional—often treatment-associated—genetic alterations of potential biological consequence.Pervasive metastasis-to-metastasis seedingIn addition to ongoing evolution in metastases, cancer cells migrating between anatomical sites will influence the subclonal landscape of metastatic disease. To elucidate the metastatic migration patterns that underpin advanced NSCLC, we applied the MACHINA algorithm8 to the phylogenies inferred for each patient to identify the subclones that seeded metastases and their corresponding migration routes (Extended Data Fig. 3).In 3 patients, seeding subclones originated exclusively in the primary tumour. In the remaining 88% (21/24), seeding subclones were identified in both the primary tumour (mean per patient [range]: 2.8 [1–8]) and one or more metastases (mean per patient [range]: 6.0 [1–16]), herein referred to as primary-to-metastasis seeding subclones and metastasis-to-metastasis seeding subclones, respectively (Fig. 3a). Metastasis-to-primary reseeding was not considered because no patients had radiologically detectable metastatic disease at the time of primary tumour resection. On average, 4 anatomically distinct metastases were seeded by the primary tumour per patient (range: 1–22). In 62.5% (15/24) of patients, these primary-seeded metastases were seeded by distinct primary-to-metastasis seeding subclones, in contrast with previous studies that detected only a single primary seeding subclone in most patients9,11,36. The number of seeding subclones identified per patient correlated with the number of anatomically distinct metastases (Pearson R: 0.48, P = 0.018) and the number of metastasis regions sampled (Pearson R: 0.52, P = 0.009; Extended Data Fig. 5a), indicating that seeding subclone prevalence can be underestimated when metastasis sampling is limited. Overall, however, a greater number of metastases were seeded by other metastases than by the primary tumour: 60% (156/258) of metastases were seeded by metastasis-to-metastasis seeding subclones, 38% (98/258) were seeded by primary-to-metastasis seeding subclones and 2% (4/258) were seeded by both (10 low-purity metastases were excluded from tree building and migration analyses; Methods and Fig. 3b). For example, in CRUKP2037, only one of the six thoracic lymph node metastases sampled at autopsy was seeded by the primary; the remaining five were seeded by subclones from other established thoracic lymph node metastases (Extended Data Fig. 3).Fig. 3: Metastatic seeding patterns.The alternative text for this image may have been generated using AI.Full size imagea, Number of patients whose metastases were seeded exclusively by the primary tumour (Primary->Met seeding only, black) or by both the primary tumour and other metastases (Primary->Met and Met->Met seeding, red). b, Number of metastases seeded by the primary tumour (black), by another metastasis (grey) or by both sources (yellow). c, Proportion of metastases seeded by other metastases (grey) or by the primary tumour (black) across increasing migration probability thresholds (Methods). d, Number of primary-to-metastasis (black outline) and metastasis-to-metastasis (grey outline) seeding subclones per patient. Each primary-to-metastasis seeding subclone is assigned a distinct colour (concordant between bars and phylogenetic tree nodes). Metastasis-to-metastasis seeding subclone colours match the colour of their primary-to-metastasis seeding ancestor, with lighter shades assigned to each new metastasis-to-metastasis seeding subclone in that lineage. Pie wedges show the fraction of metastases seeded by each correspondingly coloured seeding subclone. Metastases on body maps are coloured by their seeding subclone, with arrows indicating migration routes. Subclones contributing to both primary-to-metastasis and metastasis-to-metastasis spread are marked with an asterisk. P, primary-to-metastasis seeding subclone; M, metastasis-to-metastasis seeding subclone. Body map illustration in d by J. Brock adapted from ref. 11 under a Creative Commons licence CC BY 4.0.Consistent with studies in other cancer types9,15,36, most metastases were seeded by a single migrating subclone (72.5%, 187/258), as opposed to multiple subclones that migrated together or in sequence (Extended Data Fig. 4). Primary-to-metastasis migrations involved a single subclone more frequently than metastasis-to-metastasis migrations (P = 7.73 × 10−4, chi-squared test; Supplementary Fig. 6a). Migrations that started and ended in the same organ involved multiple subclones more often (43.6%, 41/94) than migrations between organs (26.8%, 49/183, P = 0.007, chi-squared test; Supplementary Fig. 6b), implying that a greater number of subclones were capable of migrating within, as opposed to between, organs.Three independent analyses provide orthogonal evidence for the inferred metastasis-to-metastasis migrations. First, we developed a probabilistic approach to assign a confidence level to each metastatic migration (Methods and Extended Data Fig. 5b). Metastasis-to-metastasis migrations remained predominant when considering only the migrations inferred with highest confidence (Fig. 3c). Second, we used standard-of-care radiological imaging to determine the first time each metastasis was radiologically detected. Metastases seeded by the primary were detected earlier than metastases seeded by other metastases (median 272 versus 745 days from surgery, respectively, P = 2.41 × 10−5, Mann–Whitney U test; Extended Data Fig. 5c). Third, we evaluated migration directionality by quantifying LOH events present in the seeding but not in the seeded metastasis—an implausible scenario given LOH irreversibility (Methods). Metastases linked by a metastasis-to-metastasis migration had significantly more conserved LOH events (which are not an input to MACHINA8) compared with alternative seeding sources (inferred metastasis source versus primary source P = 3.29 × 10−28; inferred metastasis source versus alternative metastasis source P = 0.011; Mann–Whitney U test; Extended Data Fig. 5d). Furthermore, the metastatic migration patterns inferred were highly consistent with those obtained using different combinations of algorithms for phylogenetic37 and metastatic migration inference38 (Supplementary Fig. 7).These patterns reveal that in NSCLC, multiple primary subclones seed metastases, initiating a cascade of metastasis-to-metastasis seeding that promotes metastatic progression.Seeding capacity across subclone lineagesTo test whether metastatic capacity differs among seeding subclones, we used the number of metastases seeded by each seeding subclone as a surrogate measure of its seeding capacity (Methods). In 93% (14/15) of the patients with multiple primary-to-metastasis seeding subclones, no significant differences were observed in the number of metastases seeded by each subclone (Extended Data Fig. 5e). In the patient in which a difference in seeding capacity was detected, CRUKP8172, one of the 2 primary seeding subclones seeded 21 of the 30 sampled metastases (Monte Carlo likelihood ratio test P = 0.00001; Fig. 3d).Despite the fact that most of the primary-to-metastasis seeding subclones had a similar metastatic capacity, only 45% (30/67) of them produced descendants that seeded additional metastases, raising the possibility that the capacity to seed metastases from the primary tumour differs from that required for metastasis-to-metastasis seeding. We therefore assessed whether subclones from the same primary tumour, after establishing metastases, had an equal likelihood of further spread. In 67% (10/15) of patients, significant differences were observed in the number of metastasis-to-metastasis migrations that descended from each primary-to-metastasis seeding subclone (Fig. 3c and Extended Data Fig. 5e). For example, in CRUKP9198, three primary subclones each seeded one or two metastases (Monte Carlo likelihood ratio test P = 1.0), but their subsequent spread differed: one did not give rise to any metastasis-to-metastasis migrations; another had a descendant that seeded one metastasis; and the third had descendants that seeded twelve metastases via metastasis-to-metastasis migrations (Extended Data Fig. 5f). Thus, in this small cohort, the capacity to seed further metastases is not uniformly inherited by descendants of primary-to-metastasis seeding subclones.Duration in situ associates with seedingTo investigate determinants of metastatic capacity, we examined patient, metastasis and subclone characteristics associated with metastasis-to-metastasis seeding. Patients in whom metastasis-to-metastasis seeding was predominant (more than 50% of all migrations) relapsed later (median DFS 14 versus 5 months, P = 0.036, Mann–Whitney U test) and had significantly longer OS (median 32 versus 15 months, P = 0.019, Mann–Whitney U test), compared with patients in whom it was not (less than 50%; Extended Data Fig. 6a). Their clinical demographics and treatment histories were otherwise similar (Extended Data Fig. 6a). In fact, OS exhibited a positive linear association with the proportion of migrations that were metastasis-to-metastasis (Pearson’s R: 0.46, P = 0.025; Fig. 4a), raising the possibility that metastasis-to-metastasis seeding is related to the duration metastases are in situ.Fig. 4: Metastasis-to-metastasis seeding associates with duration in situ and anatomical site.The alternative text for this image may have been generated using AI.Full size imagea, Correlation between the percentage of migrations that constitute metastasis-to-metastasis per patient and overall survival (days). b, Proportion of seeding (dark teal) and non-seeding (light teal) metastases stratified by time of first detection on imaging: relapse scan; scan in first half of the post-relapse period; scan in second half; or after the last scan before death (at autopsy). Metastases without known timing of first appearance or seeding status were excluded. n = 23 patients; Fisher’s exact test comparing earliest and latest categories. c, Burden of SNVs and SCNAs accumulated after seeding (that is, in descendants of the seeding subclone) for seeding (dark teal) and non-seeding (light teal) metastases. Dots indicate median per patient; lines connect patients. n= 20 patients with both metastasis types; Wilcoxon signed-rank test. d, Maximum radiological volume measured on any clinical scan for seeding versus non-seeding metastases. Dots indicate median per organ site; lines connect organ sites. n = 23 patients, 129 metastases; Wilcoxon signed-rank test. e, Proportion of seeding and non-seeding metastases with branched (grey) versus linear (white) metastasis-unique subclone phylogenies. n = 24 patients; Fisher’s exact test. f, Prevalence of metastasis-to-metastasis seeding across anatomical locations. Intra, intrathoracic; extra, extrathoracic. g, Correlation between the proportion of metastases in each anatomical site with ‘sufficient’ duration in situ (Methods) and the prevalence of seeding metastases from that site. n = 24 patients, 258 metastases. h, Proportion of non-seeding (light teal) or seeding (dark teal) metastases in intrathoracic (purple edge) and extrathoracic organs (orange edge). n = 24 patients, 258 metastases; Fisher’s exact test. i, Origin of seeding for intrathoracic (purple) and extrathoracic (orange) metastases, stratified by intrathoracic versus extrathoracic source. n = 12 patients with both metastasis types; Fisher’s exact test. The box plots show the median and IQR with whiskers denoting values within 1.5 times the IQR from the first and third quartiles. Body map illustration in h by J. Brock adapted from ref. 11 under a Creative Commons licence CC BY 4.0.To investigate this, we evaluated the seeding capacity of each metastasis with respect to when it arose during the disease course. Metastases that seeded other metastases were identified significantly earlier on radiological imaging: 36.8% (14/38) of metastases detected on the first relapse scan seeded metastases, compared with 16.4% (10/61) detected only after the last scan, at autopsy (P = 0.03, Fisher’s exact test; Fig. 4b). Consistent with having a longer duration in situ, these metastases also accrued more somatic alterations than metastases that did not seed (mutations P = 5.7 × 10−5, SCNAs P = 0.002, Wilcoxon signed-rank test; Fig. 4c and Methods). Likewise, primary-to-metastasis seeding subclones with metastasis-to-metastasis seeding descendants emerged earlier in tumour evolution than those without metastasis-to-metastasis seeding descendants (mutation distance from the trunk P = 0.053, SCNA distance from the trunk P = 0.025, one-sided Wilcoxon signed-rank test; Extended Data Fig. 6b). In fact, 68% (141/207) of the metastases that did not seed other metastases contained fewer mutations than the mutational burden typically observed at the point when metastasis-to-metastasis seeding subclones emerged in the same patient (Extended Data Fig. 6c–e and Methods). This suggests that many non-seeding metastases could potentially seed metastases with longer time in situ.Two explanations could underlie this finding. First, metastases with a longer duration in situ might reach larger sizes, such that an increased number of cancer cells have the potential to seed. Indeed, metastases that seeded other metastases had significantly greater maximum radiological volumes than those that did not (P = 0.025, Wilcoxon signed-rank test; Fig. 4d and Methods). Second, subclones capable of seeding metastases might emerge from the reservoir of subclonal diversity that accrues over time. In keeping with this, metastases that seeded other metastases had more subclones (P = 0.028, Mann–Whitney U test; Extended Data Fig. 6f) and were more likely to have branched phylogenies than were metastases that did not (P = 9.8 × 10−8, Fisher’s exact test; Fig. 4e). These data imply that the greater number of cancer cells and/or subclones afforded by a longer duration in situ is associated with the likelihood of metastasis-to-metastasis seeding.Anatomical constraints of seedingAnatomical location may also influence the likelihood of metastases seeding further metastases; for example, owing to differences in organ vascularity or lymphatic drainage39. Metastasis-to-metastasis seeding varied across anatomical locations: 28.4% of lung and 30.6% of intrathoracic lymph node metastases seeded other metastases, compared with only 8.3% of peritoneal and 10.7% of liver metastases (Fig. 4f). These differences mostly reflected the temporal order in which metastases arose. The prevalence of metastasis-to-metastasis seeding from each site strongly correlated with the proportion of metastases at that site with ‘sufficient’ time in situ to seed others, defined by comparing the mutation burden of each metastasis with the number of mutations accrued before the emergence of metastasis-to-metastasis seeding subclones in the same patient (Pearson’s R: 0.80, P = 0.01; Fig. 4g and Methods). In particular, intrathoracic metastases (encompassing mediastinal lymph node and lung metastases; Methods), which were detected significantly earlier on radiological imaging than were extrathoracic metastases (P = 3.94 × 10−4, Mann–Whitney U test; Extended Data Fig. 7a), seeded other metastases more frequently (intrathoracic versus extrathoracic metastases that seeded other metastases: 26.4% (32/121) versus 13.9% (19/137), P = 0.013, Fisher’s exact test; Fig. 4h). Thus, the tendency we observe for NSCLC metastases to emerge in an anatomical sequence (intrathoracic early; extrathoracic late) is reflected in metastatic seeding patterns.Next, we investigated whether the anatomical location in which seeding originates influences the location of the resulting metastases. Overall, 71.3% (92/129) of metastasis-to-metastasis migrations remained within the anatomical cavity in which they originated (P = 2.40 × 10−5, Fisher’s exact test; Fig. 4i); that is, intrathoracic metastases predominantly seeded intrathoracic metastases, and extrathoracic metastases predominantly seeded extrathoracic metastases. This observation was consistent across patients (Extended Data Fig. 7b), maintained when restricting to the highest confidence migrations (Methods and Extended Data Fig. 7c) and not explained by a propensity for within-organ spread (59% (54/92) of migrations within the same anatomical cavity were not within the same organ). Primary-to-metastasis seeding also differed with respect to anatomical cavities. Only one-third (16/48) of primary seeding subclones seeded extrathoracic metastases; however, those that did were more likely to seed multiple metastases than were subclones that seeded intrathoracic metastases (P = 0.007, Fisher’s exact test; Extended Data Fig. 7d). For example, in CRUKP1584, the primary tumour seeded 11 metastases—7 intrathoracic and 4 extrathoracic. The primary subclone capable of exiting the thorax seeded eight metastases, whereas the subclone that remained within the thorax seeded three (Fig. 3d). The same pattern was evident among metastasis-to-metastasis seeding subclones that originated within intrathoracic metastases (Extended Data Fig. 7e), suggesting that the process of cancer cells migrating between anatomical cavities differs from the process of migrating within them.Extrathoracic spread tracks chromosomal instabilityDespite sufficient time in situ (as defined in Extended Data Fig. 6c,d and Methods), more than half of metastases did not seed further metastases, implying that time is necessary but not sufficient for metastatic seeding. Genetic alterations that affect cancer cell phenotypes might influence their ability to seed metastases40. Previously, we found that primary tumour chromosomal instability (CIN) is associated with the likelihood of metastatic relapse18,19, the detection of multiple primary seeding subclones11 and extrathoracic spread19,41. In this cohort, in which extensive metastatic sampling has increased the detection of seeding subclones, primary-to-metastasis seeding subclones similarly contained more SCNAs (P = 9.6 × 10−5, linear mixed effects (LME) model; Fig. 5a,b) but not SNVs (Supplementary Fig. 8a) than non-seeding subclones from the same primary. In addition, the percentage of primary subclones that seeded metastases was strongly associated with the median SCNA burden (Pearson’s R: 0.62, P = 0.0013) and rate of SCNA acquisition (per mutation) across all primary subclones (Pearson’s R: 0.72, P = 7.5 × 10−5; Fig. 5c), suggesting the degree of CIN in primary subclones correlates with their likelihood to seed.Fig. 5: Extrathoracic seeding subclones are enriched for chromosomal instability.The alternative text for this image may have been generated using AI.Full size imagea, Example patient (CRUKP8780), depicting anatomical sites of subclones (top right tree), their migration patterns (body map) and the related classification of seeding and non-seeding (bottom left tree). b, SCNA burden per primary-to-metastasis seeding subclone (dark blue) and non-seeding primary subclone (light blue). LME model with subclone mutation burden as covariate and patient as random effect. n = 21 patients, 274 subclones. c, Correlation between the percentage of primary subclones that seed metastases and the median subclone SCNA burden (left) or the SCNA/SNV ratio (right). n= 24 patients. d, SCNA burden per metastasis-to-metastasis seeding subclone (dark green) and non-seeding metastasis subclone (light green). LME model as in b. n = 21 patients, 625 subclones. e, Primary subclone SCNA burden stratified by subclone seeding status: non-seeding (blue), seeded intrathoracic metastasis (purple) or seeded extrathoracic metastasis (orange), shown for patients with intrathoracic only (left; n = 12 patients, 129 subclones), extrathoracic only (middle; n = 5 patients, 70 subclones) or both intrathoracic and extrathoracic (right; n = 4 patients, 75 subclones) metastases seeded by the primary. Dots, median per patient; Lines, connect primary tumours. LME model as for b. f, Median SCNA burden per subclone in intrathoracic metastases that seed intrathoracic metastases (purple) and intrathoracic metastases that seed extrathoracic metastases (orange). LME model as in b. n = 15 patients, 32 metastases. The box plots show the median and IQR with whiskers denoting values within 1.5 times the IQR from the first and third quartiles. Body map illustration in a by J. Brock adapted from ref. 11 under a Creative Commons licence CC BY 4.0.This association extended to late-stage disease: metastasis-to-metastasis seeding subclones had significantly more SCNAs (p-value = 3.0 × 10−7, LME model; Fig. 5d) and had a higher rate of SCNA acquisition (Supplementary Fig. 8b) than non-seeding metastasis subclones from the same patient. Although some subclones had SCNAs affecting genes that are implicated in metastasis11,41—such as the focal amplification of CCND1 in the primary subclone that seeded a chest wall metastasis in CRUKP3207, and in a left frontal lobe brain metastasis that seeded a right frontal lobe metastasis in CRUKP8433—no significant enrichment in the rate of SCNAs affecting driver genes or driver mutations was detected in seeding compared with non-seeding subclones (Supplementary Fig. 9). This cohort might be underpowered to detect such differences, or this finding could suggest that CIN supports metastasis through alternative means, such as by generating subclonal diversity or by altering the tumour microenvironment42.Given the distinct seeding patterns observed between intrathoracic and extrathoracic spread, we examined the characteristics of subclones that seeded metastases in each location. Primary subclones that seeded extrathoracic metastases, but not those that seeded intrathoracic metastases, had significantly more SCNAs, compared to non-seeding primary subclones (Fig. 5e). We confirmed this observation in the published cohort of TRACERx patients11 for whom paired primary and metastasis data and extrathoracic relapse status based on imaging were available (Extended Data Fig. 7f and Supplementary Fig. 10). Metastasis-to-metastasis seeding followed the same pattern: intrathoracic metastases that seeded extrathoracic metastases contained subclones with higher SCNA burdens, compared with intrathoracic metastases that seeded intrathoracic metastases (P = 7.2 × 10−4, LME model; Fig. 5f). These data correlate subclone SCNA burden with extrathoracic seeding capacity, suggesting that CIN supports an aspect of the metastatic process that is specific to this route of spread.DiscussionPrevious genomic studies vary considerably in their conclusions about metastatic heterogeneity9,43,44,45,46 and seeding11,15,47. Sampling-related variation could account for some of these differences, given that data generated from limited samples might fail to accurately represent often-widespread metastatic disease16. Here, DNA-sequencing data from 108 primary tumour regions paired with 393 metastasis regions, encompassing the majority of metastases radiologically detected before death in 24 patients, enabled in-depth characterization of the subclonal landscape of NSCLC metastases and the cellular migrations that founded them.The predominant seeding pattern involved the dissemination of multiple primary subclones before surgery or from postoperative residual disease (62.5% of patients), each giving rise to a distinct metastasis (72.5% of metastases were founded by a single subclone). Metastasis-to-metastasis seeding from the resultant metastases, however, were inferred to account for most sampled metastases (60%), suggesting that clinical interventions aimed at minimizing existing metastatic disease could prevent further metastatic progression. Moreover, the latency period associated with metastasis-to-metastasis seeding indicates that there could be a window of opportunity for such interventions. Consistent with this hypothesis, local consolidative therapy (LCT) with radiotherapy or surgery for metastases that persist after systemic therapy improves the outcomes of patients with metastatic NSCLC48,49,50 and other cancer types51 in phase II trials. However, in a 2024 study, LCT did not produce any survival benefit in patients with NSCLC who were treated predominantly with immunotherapy52, highlighting the unresolved challenge of identifying patients who will truly benefit from this approach. Our data offer a biological rationale for this treatment strategy in appropriately selected patients, and, potentially, in selected metastases, such as those with high CIN.Corroborating insights from lineage-tracing models of metastasis53,54,55, the cellular attributes required to seed metastases, and the likelihood of doing so, varied according to the route of spread. Within-cavity seeding—in which direct invasion, spread through airspaces56,57 and lymphatic and circulatory migration are all possible routes of spread58—was more frequent and feasible than was seeding between cavities, which requires transit in the circulation. In colorectal cancer, local lymph node metastases develop through evolutionary mechanisms that are fundamentally different from those of distant metastases36,59,60. Here, CIN, which we18,19,41 and others61 have previously found to be associated with metastatic capacity, was a distinguishing feature of subclones that seeded extrathoracic metastases. This raises the possibility that the metastatic advantage conferred by CIN in NSCLC relates to a process specific to extrathoracic seeding, such as circulatory spread or adaptation to a non-thoracic microenvironment, and highlights the need for further functional studies that are designed to elucidate the mechanisms of spread along different anatomical routes.The generalizability of these results to untreated patients (88% received systemic therapies in this cohort), patients who present with de novo metastatic disease and other cancer types is unknown. The size of the cohort limited our ability to assess the effects of treatments on metastasis evolution, seeding and recurrent genomic events. Furthermore, bulk WES, although performed at high depth, can underestimate subclonal diversity, when compared with whole-genome and/or single-cell sequencing technologies, and precludes investigation of the roles of structural variants, extrachromosomal DNA, the tumour microenvironment and other non-genetic processes in metastasis. Our follow-on study, TRACERx EVO (NCT05628376), endeavours to address these limitations. It aims to recruit 600 patients with NSCLC, small cell lung cancer or pleural mesothelioma across the spectrum of cancer stages (I–IV), and to carry out up to 100 research autopsies using the PEACE study infrastructure, performing high-depth whole-genome sequencing on the collected samples.This work demonstrates the extensive genetic diversity that constitutes metastatic disease, revealing that NSCLC progression is propagated by a multitude of primary and metastasis subclones with seeding capacity. It thus highlights the value of this longitudinal, clinically annotated dataset that, by facilitating further interrogation of this complexity, will foster greater understanding of the metastatic process, and inform strategies to curtail it.MethodsPatient cohortThe PEACE studyPEACE is a pan-cancer, UK-wide research autopsy programme (https://clinicaltrials.gov/study/NCT03004755) designed to investigate the biology of metastatic disease and drug resistance. The study was sponsored by the University College London (UCL) Clinical Trials Centre and approved by the Health Research Authority National Research Ethics Service Committee London–Dulwich on 15 August 2013, in accordance with the UK Human Tissue Act 2004, with research ethics committee reference 13/LO/0972. Informed consent was provided by patients during life or by a person in a qualifying relationship after death.Eligibility was defined by the following inclusion criteria: (1) age 18 years or over; (2) confirmed solid malignancy with metastatic disease (where the site of origin is known or unknown), with the exception of primary brain tumours, in which there might not be evidence of metastatic disease; and (3) oral and written informed consent from patient to enter the study and to undergo tissue collection after death or from a nominated representative or a person in a qualifying relationship after the patient has died. Exclusion criteria were: (1) medical or psychiatric condition that would preclude informed consent; (2) history of intravenous drug abuse within the past five years; or (3) confirmed diagnosis of known high-risk infections (for example, HIV/AIDS-positive, hepatitis B or C, tuberculosis and Creutzfeldt–Jacob disease), unless the patient case is of particular scientific interest and was agreed in advance with local mortuary staff and pathologist.The TRACERx–PEACE lung cohortThe TRACERx study (https://clinicaltrials.gov/ct2/show/NCT01888601) is a prospective observational cohort study approved by an independent research ethics committee (13/LO/1546). The inclusion and exclusion criteria, clinical data acquisition and tissue and plasma sampling procedures have been described18,19. In brief, the TRACERx study includes patients with histopathologically confirmed early-stage I–IIIB NSCLC who underwent primary surgery. Patients are followed up after surgery, during which longitudinal clinical data, plasma and, in the case of disease relapse or progression, tissue samples are collected.Forty-nine patients with NSCLC were enrolled in both TRACERx and PEACE; 41 died and 33 underwent a research autopsy (Supplementary Fig. 1). Research autopsies were not performed owing to lack of death notification (n = 6), post-mortem withdrawal of consent (n = 1) or COVID-19 restrictions (n = 1).No tumour was identified in four patients who underwent an autopsy after pathological assessment. In three patients, WES data from all primary or all autopsy samples failed quality control. WES data were unavailable for two patients at data lock. The final cohort comprised 24 patients.Patients were assigned study identifiers that were subsequently converted to linked identifiers (CRUKP prefix) to maintain anonymity. Tissue and blood samples were barcoded and tracked in a centralized database overseen by the sponsor (UCL Clinical Trials Centre).Research autopsy sample procurementResearch autopsies were performed as soon as possible after death (median [IQR]: 79 h [56.0–157.3]; median time to refrigeration: 3.7 h [3.2–5.4]) at the recruiting-site affiliated mortuary (University College London Hospitals (UCLH), Guy’s and St Thomas’ Hospital, Birmingham Heartlands Hospital, Leicester Royal Infirmary Hospital or the Christie Hospital).Tissue sampling was led by a pathologist who was provided with the patient’s pre-mortem clinical history and imaging. None of the patients (or persons in a qualifying relationship) in this cohort elected to restrict research autopsy sampling. All macroscopically visible metastases were sampled where feasible, and multiple regions of individual metastases were sampled where feasible. Sample annotations that distinguished regions from an individual metastasis (that is, a multi-region sampled metastasis) from macroscopically distinct metastases were assigned to allow intra- and inter-metastasis analyses. Labelled specimens were photographed where feasible.Where possible, metastases were bisected longitudinally: one half was snap-frozen in liquid nitrogen and stored at −80 °C, and the other half was formalin-fixed and paraffin-embedded (FFPE). Fresh sterile instruments were used for each sample. Body fluids (pleural, peritoneal and cerebrospinal) were centrifuged and cell pellets snap-frozen separately. Peripheral blood was collected pre-mortem or at autopsy from the femoral vein or cardiac ventricle.Central histopathological reviewDiagnostic histopathological slides from the primary tumours were centrally reviewed as previously described18. Haematoxylin and eosin (H&E)-stained slides created from the metastasis FFPE blocks were digitally archived and assessed for tumour content, necrosis, autolysis and lymphocyte content by a pathologist.Intrathoracic and extrathoracic classificationMetastases were classified as intrathoracic or extrathoracic on the basis of the anatomical site recorded on the pre-mortem sample histopathology reports or by the autopsy pathologist. Where anatomical origin was uncertain, radiological imaging was reviewed (see Supplementary Note).Intrathoracic metastasesIntrathoracic metastases included mediastinal lymph nodes, mediastinal soft tissue, lung, lung surgical bed, pleura and chest wall if the metastasis radiologically arose from within the pleural boundary.Extrathoracic metastasesExtrathoracic metastases included axillary, cervical, supraclavicular and abdominopelvic lymph nodes, chest wall if the metastasis radiologically arose from outside the pleural boundary, other subcutaneous or soft tissue musculoskeletal masses, cardiac (pericardium and myocardium), diaphragm (unless radiological evidence of direct pleural or intrapulmonary extension), bone, liver, brain, gastric, adrenals, kidney, peritoneum and bladder.Clinical outcome dataDFS was defined as the period from the date of registration to the time of radiological confirmation of the recurrence of the primary tumour registered for TRACERx or the time of death by any cause.OS was defined as the period from the date of registration to the time of death by any cause.Radiological data curation and analysisAnonymized clinical imaging scans and reports (CT, PET–CT and MRI) were available for 23 out of 24 patients under the TRACERx study protocol, spanning baseline primary imaging, relapse, up to the last scan before death. Radiologically visible, measurable metastases were contoured using ITK-SNAP v.4.2.0 by a clinical oncologist to produce three-dimensional tumour volumes, a selection of which were reviewed by a second, senior clinical oncologist. Radiological lesions were manually mapped to sequenced metastasis samples where possible. Where multiple sequenced samples mapped to one radiological lesion, the genomic feature predominant to the set (mode) was used in analyses.Time to first detectionTime to first detection was defined as days from primary surgery to the first detection on imaging. Metastases that were not detected on imaging but were sampled at autopsy were assigned a time to first detection halfway between the last scan performed and the date of death.Scan periodScan dates were normalized per patient by the number of days between relapse imaging and death. Scans after the relapse scan and before the last scan before death were assigned to the first half (≤50%) or second half (>50%) of the metastatic period.Maximum tumour volumeMaximum tumour volume was defined as the largest volume recorded for a metastasis on any longitudinal scan.DNA extraction and WESDNA extraction and WES were performed as previously described18 for both primary tumour and metastasis samples. Paired germline DNA was resequenced in the same run as subsequently sequenced metastases.All primary tumour regions and pre-mortem metastasis biopsies and metastatectomies collected underwent WES. Research-autopsy metastasis samples with adequate histopathological tumour content and DNA integrity number (DIN) > 4, as measured using an Agilent TapeStation system, underwent WES. In patients with a large number of suitable samples, samples were selected to capture the anatomical distribution of metastatic disease while maximizing tissue quality. Overall, 376 out of 601 research autopsy metastases were selected for WES (Supplementary Fig. 1).Post-sequencing quality controlPrimary and metastasis samples that failed copy-number calling and variant calling are summarized in Supplementary Fig. 1. Metastases that passed variant but not copy-number calling (n = 10) were included in analyses that do not involve phylogenies (which require both copy-number and mutation calls).Bioinformatic pipelineWES data were analysed using the previously described bioinformatic pipeline to perform alignment, somatic mutation calling, copy-number detection and signature artefact quality control with the following modifications: (i) DNVs were defined by two criteria: First, a proportion test was performed to determine whether the frequencies of two SNVs were significantly similar. For all putative DNVs with a significant test, reads were extracted to calculate the proportion of reads overlapping between the bases. DNVs were called when ≥90% of reads contained both variants in at least one sample from the patient. (ii) Refphase62 was used to infer haplotype-specific copy-number alterations, and to rescue low-purity tumour regions, using the multi-region data. (iii) Driver mutation annotation was updated as detailed in ‘Detection of driver alterations’.WGD detectionWGD events were identified and assigned to phylogenies using ParallelGDDetect as described previously19.Genomically independent tumoursSequenced primary and metastasis regions were deemed genomically related or independent to the other samples collected from the patient as previously described19.Multiple genomically independent tumours were detected for three patients. In CRUKP1584, a synchronous lung tumour resected at the time of primary surgery was genomically independent from the other tumour resected at the same time and all metastases subsequently sampled, consistent with a second primary that did not metastasize. In CRUKP8172 and CRUKP7741, metastases sampled at autopsy (a lung and an oesophageal sample, respectively) were genomically independent from the corresponding primary and metastasis samples, consistent with second primaries or metastases from undetected second primaries. Because paired primary–metastasis samples were available for these three genomically independent tumours, they were excluded from the final cohort.Subclone and phylogenetic tree reconstructionMutation clustering and tree buildingCONIPHER63 was used to identify clusters of somatic mutations that occurred in the same tumour subclone and to reconstruct tumour phylogenetic trees. Two functionalities of CONIPHER were important in the context of the number of samples available per patient. First, mutations were pre-clustered by presence (more than one mutant read) or absence across the tumour regions available per patient and PyClone64 was applied to each mutation group independently, making the mutation clustering step scalable11,63. Second, CONIPHER enumerated all possible phylogenetic tree topologies compatible with the pigeonhole principle and the crossing rule65 for each set of mutation clusters. The sum condition error (SCE) was computed for each solution to quantify the extent to which the evolutionary constraints imposed by the topology were violated. The tree topology with the lowest SCE was selected for further analysis and the multiple solutions were used to assign metastatic migration probabilities (see ‘Metastatic migration probabilities’).Subclone clonalityPhylogenetic trees were used to classify mutation clusters as truncal or subclonal. The truncal cluster corresponded to the mutation cluster ancestral to all others, or the MRCA. Remaining clusters were classified as subclonal and their presence or absence in primary and metastasis regions was further subclassified: primary-unique (detected in the primary, undetected in any metastasis), metastasis-unique (detected in one or more metastases, undetected in the primary) or shared subclonal (detected in the primary tumour and in one or more metastases).Somatic mutations, SCNAs and WGDs were classified as truncal, shared subclonal, primary-unique or metastasis-unique on the basis of clusters they were assigned to by CONIPHER63, ALPACA41 and ParallelGDDetect19, respectively.Inference of subclone proportionsSubclone proportions per sample were inferred from the mutation cluster cancer cell fraction (CCF) and the phylogenetic tree topology. Leaf node CCFs represent the terminal subclone proportions. Internal node proportions were calculated by subtracting the summed CCFs of the descendant clusters from the parent cluster CCF iteratively from the leaf nodes to the trunk. Subclones with proportions ≤5% were considered extinct (>5% were termed extant).Inference of subclone copy-number profilesALPACA41 uses the subclonal and phylogenetic structure of tumours derived from SNV frequencies to infer subclone-specific copy-number profiles. ALPACA was run with default settings, using as input the phylogenetic tree and subclone proportions derived from CONIPHER, the allele-specific fractional copy-number estimates from Refphase62, and estimated confidence intervals. The burden of SCNAs per subclone was computed as the total number of break points detected per subclone and the ratio of SNVs to SCNAs was used to quantify the rate of SCNA acquisition per subclone.Detection of driver alterationsSomatic mutations were annotated using OncoKB30,31, openCRAVAT66 (https://www.opencravat.org/) and the Ensembl Variant Effect Predictor (v.114)67. Mutations were classified as putative drivers if they fulfilled any of the following criteria: (1) classified as a loss-of-function event by LOFTEE68 in a gene annotated as a TSG in the COSMIC Cancer Gene Census (v.102)69 (https://cancer.sanger.ac.uk); (2) called by SpliceAI70 (using a threshold of 0.8) in a gene listed in COSMIC (v102); (3) predicted to be a driver mutation by BoostDM71; (4) classified as a driver by CHASMplus72 at a false discovery rate 50% of mutations (ie. the majority aetiology). d, Percentage of primary (truncal, shared subclonal and primary-unique) and metastasis-unique subclones per patient in which each signature aetiology was detected. Dashed lines connect patients. Wilcoxon signed-rank test. e, The phylogenetic tree and estimated signature activities for each subclone (node) expressed as the fraction of mutations assigned to each signature (pie wedges) for CRUKP4761. Branches coloured based on the presence (bold red) or absence (black) of episodic APOBEC mutagenesis (Methods). f, The majority aetiology of metastasis-unique subclones that are unique to a single anatomical location. g, Mean cosine distance per patient between signature compositions of metastasis-unique subclones and their ancestral primary subclones (shared subclonal). Dashed lines connect patients. Wilcoxon signed-rank test. The box plots show the median and IQR with whiskers denoting values within 1.5 times the IQR from the first and third quartiles.Extended Data Fig. 2 Effects of treatment and genome doubling on metastases.a, Association between cumulative chemotherapy duration (sum of days across all adjuvant and metastatic regimens) and the number of metastasis-unique mutations (top) and metastasis-unique driver mutations (bottom). 24 patients. b, CRUKP2037 body map and phylogenetic tree showing the evolutionary timing and anatomical distribution of driver mutations (SNVs, indels and DNVs, teal), focal LOH affecting TSGs and arm-level LOH events, culminating in biallelic disruption of MAP3K1 (5q), PTEN (10q), TP53 (17p) and B2M (15q) (blue) and genome doubling (WGD, black) events. c, Number and evolutionary timing of WGD events per patient. d, Correlation between the number of WGD events and inter-metastasis heterogeneity (green) or primary–metastasis SCNA diversity (blue) per patient. Body map illustration in b by J. Brock adapted from ref. 11 under a Creative Commons licence CC BY 4.0.Extended Data Fig. 3 Metastatic migration patterns.Body maps depicting the anatomical site of origin and the site of the metastasis seeded for each inferred metastatic migration (arrows, coloured according to the migration starting site). Body map illustration by J. Brock adapted from ref. 11 under a Creative Commons licence CC BY 4.0.Extended Data Fig. 4 Origin and number of seeding subclones per migration.a–c, The number of subclones (indicated by the number of small circles), their anatomical site of origin and the site of the metastasis seeded (indicated by the colour of the small circle and large circle respectively) are displayed for migrations (arrow) involving a single subclone only (a), multiple subclones that all originated from the same site (b) and multiple subclones that originated from different anatomical sites (c). Migrations are distinguished based on the site of origin of the subclones involved (primary-to-metastasis migrations; black arrow, metastasis-to-metastasis; grey arrow, migrations involving subclones from both the primary and a metastasis; yellow) and are ordered according to whether the seeding subclones originated in the same organ as the metastasis and by the number of seeding subclones. Subclones linked by a line originated from the same metastasis.Extended Data Fig. 5 Orthogonal evidence for metastasis-to-metastasis seeding.a, Correlation between sampling extent (number of primary regions, individual metastases, and metastasis regions per patient) and the number of seeding subclones identified per patient. b, Schematic of metastatic migration probability estimation. For each patient, MACHINA was run on the 100 most plausible phylogenetic tree topologies. The frequency of solutions supporting each migration was used to derive migration probabilities. c, Days from surgery to first radiological detection of metastases seeded by the primary tumour (black) or another metastasis (grey). 23 patients. Mann–Whitney U test. d, Fraction of LOH events conserved between each metastasis and its inferred seeding source (grey), an alternative metastasis (yellow), or the primary tumour (black). 24 patients. Mann–Whitney U test. e, Monte Carlo likelihood ratio tests of the null hypotheses that primary-to-metastasis seeding subclones from the same primary tumour seed an equal number of metastases (left) and give rise to an equal number of metastasis-to-metastasis migrations (right). f, Phylogenetic tree and metastatic migration patterns for CRUKP9198 showing 3 primary-to-metastasis seeding subclones (lineages A–C). Although each primary lineage seeded a similar number of metastases (top), lineage C generated substantially more metastasis-to-metastasis migrations than A or B (bottom). The box plots show the median and IQR with whiskers denoting values within 1.5 times the IQR from the first and third quartiles. Body map illustrations in b,f by J. Brock adapted from ref. 11 under a Creative Commons licence CC BY 4.0.Extended Data Fig. 6 Metastasis-to-metastasis seeding associates with time in situ.a, Clinical features of patients with predominant (>50% of total migrations, n = 14) or non-predominant (1) metastases stratified by location of the metastasis seeded (intrathoracic, purple; extrathoracic, orange) for primary-to-metastasis seeding subclones (n = 24 patients, d) and metastasis-to-metastasis seeding subclones from intrathoracic metastases (n = 20 patients, e). Fisher’s exact test. f, Site of relapse based on radiological imaging was known for 62 of the 126 patients in the published TRACERx 421 relapse cohort. SCNA burden of seeding and non-seeding primary subclones was compared for patients with intrathoracic-only relapse (n = 22, left) and patients with \(\ge \)1 extrathoracic metastasis at relapse (n = 40, right). LME model with subclone mutation count as covariate and patient as random effect. The box plots show the median and IQR with whiskers denoting values within 1.5 times the IQR from the first and third quartiles.Supplementary informationSupplementary Information (download PDF )This file contains Supplementary Figs. 1–10, Supplementary Table 1 and Supplementary Note.Reporting Summary (download PDF )Supplementary Information (download PDF )PEACE protocolPeer Review File (download PDF )Rights and permissionsOpen Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.Reprints and permissionsAbout this article