IntroductionThe coagulase-positive Staphylococcus argenteus is a close relative of the more well-known opportunistic pathogen Staphylococcus aureus. It was initially identified as a divergent lineage of S. aureus and was formerly known as clonal complex 75 (CC75)1. In 2015, S. argenteus was officially designated as a distinct species2. It has been implicated in a variety of superficial and invasive diseases, including toxin-mediated food poisoning3, skin and soft tissue infections4,5, bloodstream infections6,7, bone and joint infections8, and purulent lymphadenitis (infection of the lymph nodes)9. S. argenteus has also been documented in food products10 and diseased animals11,12. Reports of methicillin resistant and multidrug resistant strains of S. argenteus from human disease cases13,14,15 pose a grave concern, and instances of their transmission have been documented16,17. While the prevalence of S. argenteus in diseases relative to S. aureus remains low, misidentification with S. aureus and the lack of a systematic surveillance effort may have underestimated its prevalence.Population genomic studies of S. argenteus are few, but those few reports have uncovered notable genomic features that underlie its growing importance as a pathogen. A genomic analysis of 153 globally distributed genomes strains from different geographical origins indicated that geographical distribution may have propelled the divergence of S. argenteus and its horizontal acquisition of mobile genetic elements18. In the Dutch national surveillance of methicillin-resistant S. aureus (MRSA), a total of 54 isolates were identified as S. argenteus, with the earliest methicillin resistant S. argenteus documented in 200815. In a retroactive assessment of Taiwanese staphylococcal bacteremia between 2010 and 2012, 47 cases that were previously attributed to S. aureus were determined to be due to S. argenteus infection6. In Canada and the United States, 22 S. argenteus isolates displayed evidence of considerable diversity in terms of sequence types (ST), with many genomic features related to virulence in common with those of S. aureus19. Analysis of 132 S. argenteus genomes from fourteen countries spanning over 13 years revealed the growing prevalence of sequence type (ST) 2250, which was notable for the presence of a CRISPR/Cas system that was absent in most other STs20. In Thailand, the rapid spread of the dominant ST2250 in the past 20 years is attributed to the acquisition of multiple exotoxin and antimicrobial resistance (AMR) genes that have been linked previously to livestock-associated S. aureus21. The global distribution of S. argenteus suggests international dissemination and the presence of geographic hotspots of the species in Southeast Asia, Australia, and the Amazon22.Despite growing evidence of the clinical importance of S. argenteus, knowledge about the underlying causes of the growing threat of AMR emerging in the species remains limited. Here, we analyzed 379 genomes of S. argenteus to describe its global population structure and evolutionary history. Overall, our results show that the AMR pool of S. argenteus is driven in part by the global clonal expansion of ST2250 in the last five decades and subsequent divergence into sublineages with distinct AMR gene content. Continued surveillance of S. argenteus will be critical in developing effective strategies to reduce the risk for multidrug-resistant strains to emerge and further disseminate.ResultsThe global S. argenteus population is phylogenetically diverse but dominated by ST2250We retrieved a total of 379 high-quality genome sequences of S. argenteus derived from 28 countries in six continents, four ecological sources, and collection years of 2005–2023 (Fig. 1 and Supplementary Data 1). The median number of genes per genome was 2540 (range: 2333–2719) (Supplementary Fig. 1A). The genome fluidity23 was 0.109 (Supplementary Fig. 1B), indicating that on average, 10.9% of genes were unique (i.e., not shared) between genome pairs while 89.1% of genes were common between pairs. This low genome fluidity suggests that S. argenteus exhibits limited gene content variability. To further evaluate gene expansion dynamics, pan-genome openness was assessed using Heaps’ law24. The γ coefficient obtained for ST2250 was 0.064, while the value for the complete dataset was 0.084 (Supplementary Fig. 1C). Although the pangenome is considered open, the low γ values observed are consistent with the genome fluidity results and indicate restricted gene content variability in S. argenteus.Fig. 1: Global population structure of S. argenteus (n = 379 genomes).A Midpoint-rooted maximum likelihood tree showing the phylogenetic relationships of globally distributed S. argenteus genomes. The tree was built using 59,124 single nucleotide polymorphisms (SNPs) derived from aligned and concatenated sequences of 2180 core genes (i.e., genes present 99% of genomes). Tree scale represents number of nucleotide substitutions per site. The colored lines emanating from the branches of the phylogeny are colored by sequence type (ST) with only the most common six STs shown in color while less common STs are grouped as others (i.e., present in fewer than 10 genomes or without ST assignment). The four outer rings (from inner to outer) represent the SCCmec type, source, continent and time period. For visual clarity, only the six major STs are shown. B World map showing the major geographical regions from where the 379 S. argenteus were derived. The pie charts show the distribution of the six major ST per continent. The colors in the pie charts represent the STs following the color scheme in (A). The number next to each pie chart shows the total number of genomes per region. C Proportion of STs and number of genomes per year. The stacked colors in each bar represent the STs following the color scheme in (A). The numbers above each bar represent the total number of genomes per year. D Number of genomes per ST. The stacked colors in each bar represent the SCCmec types, following the color scheme in (A). The numbers above each bar indicate the total number of genomes of each ST. Detailed information of the features and associated metadata of each genome is presented in Supplementary Data 1.Full size imageMulti-locus sequence typing (MLST) analysis revealed the presence of 13 previously recognized STs (Supplementary Data 1). Six sequence types (ST) comprised 88.39% (335/379) of the genomes in the population: ST2250 (n = 213 genomes), ST1223 (n = 44), ST2198 (n = 24), ST2793 (n = 23), ST5961 (n = 17), and ST2854 (n = 14) (Fig. 1A). ST2250 alone accounted for 56.2% of all genomes. The remaining seven STs were each found in eight or fewer genomes and included both known and unassigned STs. Six other STs were represented by a single genome.To further explore the global population structure, we constructed a maximum likelihood phylogenetic tree based on the alignment of 59,124 single nucleotide polymorphisms (SNPs) derived from 2180 core genes (Fig. 1A and Supplementary Data 2). Each of the six major STs was located on a long branch of the phylogenetic tree and formed a distinct monophyletic cluster, indicating deep time divergence from other STs. The very short branches on each of the phylogenetic clusters of the six major ST indicate strong genetic homogeneity within each ST. Genomes from disparate continents and ecological sources were intermingled throughout the tree. Although no strict segregation by geography or source was observed, the phylogeny revealed trends of partial clustering. For instance, ST5961 comprised predominantly of genomes from food and the environment and was mainly found in Asia. In contrast, ST2793 included a large number of European genomes, while ST2198 was primarily associated with North American genomes (Fig. 1B). The majority of genomes used in this study were of human origin (n = 312), representing 82.3% of the total dataset. Within the predominant lineage ST2250, humans were the main source (n = 205). Among these human-associated ST2250 genomes, most genomes originated from Asia (112/205) and Europe (74/205). Beginning in the year 2008 onwards, ST2250 was consistently present every year (Fig. 1C).The mobile genetic element Staphylococcal Cassette Chromosome mec (SCCmec) is a unique feature of methicillin-resistant Staphylococcus species25. It carries the mecA gene encoding a low-affinity penicillin-binding protein 2a (PBP2a) that confers resistance to beta-lactam antibiotics26. In S. aureus, 15 structurally distinct types of SCCmec have been identified26,27. In our S. argenteus dataset, we found SCCmec present in 106 genomes (Fig. 1D and Supplementary Data 1). We identified five types of SCCmec: Ia (n = 2 genomes), IVa (n = 13), IVb (n = 1), IVc (n = 89), and IVd (n = 1). SCCmec type IVc elements were frequently detected in ST1223, ST2250, and ST2793, in 13.64%, 30.51%, and 95.65% of the genomes of each ST, respectively.A variety of virulence genes is present in ST2250 and in less common STsWe screened all genomes for the presence and distribution of genes associated with virulence. We identified a total of 86 unique virulence genes, with a median of 60 genes per genome (range: 51–71) (Supplementary Data 3). A total of 45 virulence genes were present in 99–100% of the genomes. We found a small but significant difference in the number of virulence genes per genome between ST2250 and non-ST2250 genomes, with a median of 60 and 64, respectively (Fig. 2A). When comparing specific functional categories of virulence genes per genome between ST2250 and non-ST2250 genomes, we found significant differences for those genes associated with adherence, immune modulation, exoenzyme, exotoxin, and effector delivery (all with p 1, indicating that the presence of one gene is associated with a higher likelihood of occurrence of another28 (Fig. 3F). The pair with the highest RR is tetL and aph(3’)-IIIa. It should be noted that although high values indicate a strong relative association, they do not necessarily reflect high absolute probabilities28.Among the AMR gene pairs analyzed, some resistance determinants stood out for both high CP and RR values, suggesting consistent co-occurrence patterns. The gene tetL showed a strong association with aph(3’)-IIIa (CP = 0.997; RR = 23.7), while dfrG was frequently observed in the presence of mecA (CP = 0.997; RR = 12.75). Other significant associations included aph (3’)-IIIa in the presence of PC1-type blaZ (CP = 0.967; RR = 1.75), fosB in the presence of aph (3’)-IIIa (CP = 0.978; RR = 1.43), tetL in the presence of fosB (CP = 0.997; RR = 1.44), dfrG in the presence of PC1-type blaZ (CP = 0.972; RR = 1.67), and mecA in the presence of PC1-type blaZ (CP = 0.942; RR = 1.69).Altogether, the AMR results show that S. argenteus, mainly driven by ST2250, harbor a diverse array of AMR genes. Co-occurring AMR genes in different combinations indicate co-inheritance, whether vertically or horizontally, and high potential for multidrug-resistant lineages to emerge when multiple genetic determinants of AMR converge in the same genetic background.Heterogeneous distribution of plasmid replicons and co-occurrence with AMR genesAcross the 379 genomes evaluated, we identified 43 distinct plasmid replicon types, consisting of 41 rep-type and two Col-type (Supplementary Data 6). Among the STs with more than 10 genomes, ST2793 showed the highest proportion of genomes carrying plasmids (22/23 genomes, equivalent to 95.7%), followed by ST2250 (178/213 genomes, 83.5%) (Supplementary Fig. 3A). All genomes from the five most frequent STs carried rep types. Plasmid type ColRNAI_1 was detected in an ST1850 genome (Accession no. ERR3501086), while ColpVC_1 was present in an ST2250 genome (Accession no. GCA_010570875.1_ASM1057087v1). No plasmid replicon was detected in ST5961 (0/17) genomes. Overall, these results reveal heterogeneity in plasmid element distribution among S. argenteus lineages.Because many of the genomes in our dataset consisted of short-read sequences, we were unable to verify the gene content of plasmids. Nonetheless, it is still possible to investigate whether any of the plasmid replicon types co-occur with specific AMR genes more often than expected by chance. We identified three plasmid replicon types and six AMR genes that exhibited significant co-occurrence (Supplementary Fig. 3B and Data 7). Plasmid types rep16_1_CDS8 (pSAS) and rep5a_1_repSAP001 (pN315) both co-occur with the same AMR genes PC1-type blaZ, aph(3’)-IIIa, fosB and tet(L) (all with p-values ranging from 1.10 × 10⁻⁴⁷ to 6.56 × 10⁻²⁸, Fisher’s exact test), and all pairs were detected between 89 and 209 genomes. Among these pairwise relationships, plasmid pairings with tetL and aph(3’)-IIIa stood out due to their high odds ratio (OR) values, reflecting the high probability of their presence in genomes carrying these plasmid replicon types. We also found significant co-occurrence of AMR genes PC1-type blaZ and fosB with the three plasmid replicon types, although co-occurrence with rep20_3_rep (pTW20) were relatively fewer compared to rep16_1_CDS8 (pSAS) and rep5a_1_repSAP001 (pN315). Nonetheless, it is notable that dfrG (p = 1.18 × 10⁻¹⁰; OR = 165.2; in 21 genomes) and mecA (p = 3.17 × 10⁻⁸; OR = 109.1; in 21 genomes) showed statistically significant though less frequent co-occurrence with this replicon rep20_3_rep (pTW20), but not the other two plasmid types.Molecular dating of ST2250ST2250 comprised more than half of our dataset. We therefore sought to determine the evolutionary history of ST2250 using a time-calibrated phylogenetic tree. We estimated that the most recent common ancestor of ST2250 emerged in 1967 (95% highest posterior density (HPD) interval = 1950–1978) (Fig. 4A). Nearly all of ST2250 (n = 200/213 genomes or 93.90%) contain the fosB gene; in contrast, of the 166 non-ST2250 genomes, 79 carry the fosB gene (corresponding to 47.60%) which included ST1850, a small cluster of eight genomes that is the most closely related clade to ST2250 (Supplementary Fig. 2).Fig. 4: Molecular dating of ST2250 (n = 212 genomes).A Bayesian maximum clade credibility time-calibrated phylogeny of ST2250 based on non-recombining regions of the core genome alignment. The colored lines emanating from the branches of the tree represent the major geographical regions. genomes were also annotated based on geographical origin and antimicrobial classes, both represented by colored dots next to the branches. Different SCCmec types are indicated by distinct colors of colored bars. Antimicrobial resistance elements present in each genome are indicated by colored dots and their colors represent the antimicrobial classes. The divergence date (median estimate with 95% highest posterior density dates) is indicated as a horizontal blue line on the tree. For visual clarity, only the relevant divergence dates related to the clonal origin and SCCmec acquisition are shown. B Bayesian skygrowth plot showing changes in effective population size (Ne) over time. Median is represented by a solid line, and 95% confidence intervals are represented by the shaded blue area around the median. Details of the molecular dating analyses are presented in Supplementary Fig. 4, Data 10 and Data 11.Full size imageThree SCCmec types were detected in ST2250 genomes. SCCmec type IVc was identified in 59 genomes that form a single cluster within the phylogeny and are mainly found in Europe. We estimated the origin of this cluster to be 1992 (HPD = 1984–1999). Nearly all members of this cluster (n = 55/59 genomes) also harbor the AMR gene dfrG. SCCmec type IVa was detected in three genomes, with its origins estimated in 1998 (HPD = 1991–2006) and 2011 (HPD = 2009–2012). SCCmec type Ia was identified in two genomes, with an estimated origin of 2017 (HPD = 2012–2017). These events indicate multiple independent acquisitions of distinct SCCmec elements within ST2250. Another phylogenetic cluster within ST2250 (n = 112 genomes) that is mainly found in Asia emerged in 1989 (HPD = 1979–1995). This Asian cluster is characterized by the high prevalence of aph(3’)-IIIa (n = 92 genomes, equivalent to 82.14%) and tetL (n = 84 genomes; 75%), but only two genomes carry SCCmec (type IVa). Both Asian and European sublineages of ST2250 exhibited multiple instances of intercontinental dissemination based on the presence of some genomes from one continent intermingled within the phylogenetic cluster of genomes from the other continent.Demographic reconstruction revealed a rapid increase in the effective population size of ST2250 in the past 56 years, peaking in 2012 then rapidly declining afterwards (Fig. 4B). The molecular dating results show that the evolutionary history of ST2250 is punctuated by sublineages that have diversified and independently acquired various combinations of AMR genes; however, there is some indication that the ST2250 population is experiencing a decline in recent years.DiscussionThe growing number of reports of human diseases attributed to S. argenteus worldwide, as well as its presence in non-clinical environments, call for a deeper understanding of this emerging pathogen. The recent identification of this species in foodborne outbreaks further highlights this concern and underscores the importance of understanding its biology as an opportunistic pathogen3,29,30. Our study contributes to addressing this gap in knowledge by elucidating the population genomic features of S. argenteus, especially with respect to AMR dissemination. Overall, our results show that the AMR pool of S. argenteus is driven in part by the global clonal expansion of ST2250 in the last five decades and its subsequent diversification into sublineages with distinct AMR gene content.The demographic pattern of S. argenteus up until 2012 (shown in Fig. 4B) mirrors that of antimicrobial resistant S. aureus31, and we can only speculate as to the underlying causes of this surge in population growth in S. argenteus. The expansion of ST2250 has certainly contributed to shaping the global population dynamics of S. argenteus. In S. aureus, demographical characteristics point to a series of epidemic waves of antimicrobial resistant strains instigated by one or a few successful clones that have caused a large proportion of infections in hospitals and communities worldwide31. At least four waves of resistance are known and are attributed to a penicillin-resistant clone in the 1940s and four distinct MRSA clones (MRSA I, II, III, IV) between the 1960s and 2000s31. These resistant clones appeared immediately after antimicrobial agents were widely introduced in clinics, thereby creating a strong selective pressure on nosocomial strains31. Moreover, the dominance of ST2250 and its rapid expansion in recent evolutionary time may partly explain the low values of genome fluidity and Heaps’ γ coefficient, indicating limited gene content variation.The high prevalence of fosB in ST2250 genomes (93.90%) compared to non-ST2250 genomes (47.60%) may suggest that the gene’s persistence in this lineage played an important role in the evolutionary expansion of ST2250. Discovered in 1969, fosfomycin inhibits bacterial cell wall synthesis by impeding the formation of the peptidoglycan precursor32. It is used primarily as an oral treatment for urinary tract infections33. While considered an “old” antibiotic, fosfomycin has seen a resurgence in interest and use due to the rise of multidrug-resistant bacteria, including MRSA strains34, which are known to also exhibit multidrug resistance35. The origin of fosB in ST2250 is likely linked to plasmid acquisition, since this gene has been reported widely in S. aureus36 as well as in other Staphylococcus species37. The contribution of mobile genetic elements in ST2250 evolution has previously been highlighted; for example, in Thailand, AMR and exotoxin genes linked to livestock-associated S. aureus were mobilized via plasmids and phages to ST225021. In-depth investigations of mobile genetic elements and their gene cargo using long-read sequencing of S. argenteus strains will be paramount to understanding how ST2250 has acquired its AMR genes. The closest relative of ST2250 is ST1850, and while it is not frequently detected in our dataset (eight genomes), it also harbors fosB, which may suggest the presence of fosB in the common ancestor of ST2250 genomes.Two sublineages of ST2250 emerged independently at nearly the same time (Asian sublineage in 1989 and European sublineage in 1992), and their distinct AMR profiles likely explain the co-occurrence patterns of AMR genes shown in Fig. 3E, F. The tight co-occurrence of aph(3’)-IIIa, blaZ, and tetL in the Asian sublineage, as well as that of mecA, blaZ, and dfrG in the European sublineage, may be indicative of an initial horizontal co-acquisition of these genes in the common ancestor of each sublineage, followed by vertical inheritance in descendants as each sublineage diversified. The date of origin of the Asian lineage that we report here is consistent with that reported by Moradigaravand et al.21. Their study on 68 Thai S. argenteus genomes sampled between 2006 and 2013 inferred an introduction of ST2250 from Malaysia to Thailand 20–30 years ago (therefore 1983–1993)21. Previous studies suggest that this divergence may have been accompanied by ecological adaptation events, facilitated by plasmid-mediated acquisition of AMR genes18,21. A similar scenario has occurred in S. aureus ST59, which experienced concurrent but independent evolution of distinct sublineages in North America and East Asia, driven by mobile genetic elements38.The population decline of ST2250 after 2012 is certainly intriguing. However, it is important to note that this observed decline is in part influenced by uneven temporal sampling, particularly from 2021 to 2023, and should be interpreted with caution. Skyline-based demographic reconstructions are known to be sensitive to temporal sampling density39. Nonetheless, such a decline is not inconceivable. In S. aureus, fluctuations in the effective population size of antimicrobial-resistant lineages have been documented and reflect a scenario of lineage replacement. In S. aureus, a gradual replacement of ST5 by ST8 in Colombia40, ST764 in Japan41, and ST72 in South Korea42 has been reported. Another example is methicillin-resistant ST239 being superseded by ST5943. ST5 and ST8 from bloodstream infections in the United States have alternatively fluctuated in their effective population sizes44. In S. argenteus, the second most common ST is ST1223, with some genomes carrying any of the four SCCmec types. All ST1223 genomes analyzed here carry the fosB gene as well as differential distribution of aph(3’)-IIIa, PC1_Type blaZ, and mecA. On the other hand, nearly all ST2793 genomes have SCCmec type IVc. Both STs have been detected in multiple years. If lineage replacement is indeed occurring, these two STs are possible contenders to replace ST2250. It is certainly interesting that all six major STs exhibit similar topological structure in the phylogeny – each one is found on a very long branch and from which very short branches emanate, indicating a rapid diversification in more recent time (Fig. 1A). Thus, continuous monitoring of these less common STs is needed to identify those with surging population sizes. Importantly, such replacement does not necessarily involve only lineages of the same species and may also reflect the emergence of other staphylococcal lineages or species occupying overlapping ecological niches. While the precise range of ecological niches inhabited by S. argenteus remains obscure, co-occurrence of different Staphylococcus species in the same niches have been previously reported45,46,47.However, our dataset does not include a sufficient number of non-ST2250 genomes to carry out a time-calibrated phylogenetic analysis and elucidate the demography of ST1223, ST2793, and other less common STs. This gap in knowledge should therefore be an impetus to conduct a broad, long-term sampling scheme and robust genomic surveillance of S. argenteus to discover sources and genetic diversity of non-ST2250 lineages, as well as determine whether the decline of ST2250 will continue. Moreover, fitness and competition experiments of different S. argenteus STs will be particularly informative in understanding the genetic factors determining the success or failure of individual clones.We acknowledge the limitations of our study. First, the reliance on publicly available genomes led to a disproportionate representation of genomes from a few geographical and ecological sources. We recognize that our findings reflect more of what exists in clinical human sources, which comprise the majority of our dataset. It is also highly likely that numerous S. argenteus strains remain unidentified in S. aureus genomic surveillance studies. We also do not have phenotypic data on AMR and virulence; hence, we cannot ascertain the antimicrobial susceptibility features of individual strains. Plasmid analysis is also limited because many of the genomes in our dataset were short-read sequenced. Second, existing databases for ST identification and for querying AMR genes, virulence-associated genes, and SCCmec types are limited to those built for the more thoroughly studied S. aureus. Hence, genetic variants specific to S. argenteus are likely to exist but remain obscured from current in silico detection methods and databases. Third, the co-occurrence analysis of AMR genes did not treat genomes as independent observations nor adjust for population structure or phylogenetic non-independence. Consequently, particularly given the overrepresentation of ST2250 in the dataset, the observed association patterns may in part due to clonal inheritance rather than independent horizontal transfer events. Regardless, it does not minimize our results showing the importance of ST2250 in the global population. Lastly, any investigation of ancestral lineages is limited by the diversity of contemporary lineages that are being analyzed. This means that missing lineages are likely to influence our inference of the time to the most recent common ancestor and dates of clonal origins. Notwithstanding these limitations, our study should be considered as a baseline census of the standing genomic and lineage diversity of S. argenteus worldwide. We anticipate that the results presented here will form the basis for outstanding questions and open multiple avenues for future research and surveillance efforts. For example, the range of disease presentations and the mechanisms of disease transmission remain to be elucidated in S. argenteus, as well as the prevalence of S. argenteus in asymptomatic carriage in the human population.In summary, our results reveal that the global spread of resistance genes related to various antimicrobial classes is driven in part by the rapid lineage expansion and AMR acquisition of ST2250 in the past five decades. Our study will help inform public health efforts to developing effective strategies to reduce the risk for further dissemination of S. argenteus lineages with multidrug-resistant and virulent characteristics.MethodsGenome collection and annotationA total of 437 genome sequences were retrieved from the National Center for Biotechnology Information (NCBI) Sequence Read Archive in September 2023. Paired-end reads were assembled de novo using the Shovill v.1.1.0 pipeline (https://github.com/tseemann/shovill), implementing the k-mer-based assembly algorithm SKESA v.2.4.048, and followed by pre-and post-processing methods in Shovill. We employed the—trim flag for trimming of adapter sequences. To assess genome quality, we used QUAST v.5.0.249 and CheckM v.1.1.350. Genomes with 5% contamination were excluded. We also excluded assemblies with >200 contigs and N50