Cardiospermum halicacabum L. is a Sapindaceae plant from the genus Cardiospermum (Fig. 1a and d), and the whole plant has medicinal values1. It is found more frequently in China’s east, south, and southwest than in its north. It is widely distributed and grown in tropical and subtropical regions worldwide, growing in fields, bushes, roadsides, and forest borders, Sapindaceae has around 150 genera and 2,000 species, including Dodonaeoideae, Melicoccus, and Sapindoideae, which has 25 genera, 53 species, 2 subspecies, and 3 variations. Sapindoideae consists of around 17 genera and 43 species in China. However, Cardiospermum is a genus with just one plant in China2. Cardiospermum. halicacabum is a widely utilized medicine in China that is abundant in resources and has a proven healing effect. In traditional Chinese medicine, C. halicacabum is believed to dispel heat and toxins, eliminate moisture, and ventilate the airway. Itis used to treat pharyngitis, whooping cough, jaundice, eczema, and furuncles3. the principal chemical components of C. halicacabum include flavonoids, triterpenoids, coumarins, organic acids, and steroids. The primary flavonoids are luteolin, chrysogenin, quercetin, apigenin, and their glycosides4]– [5. The majority of triterpenoids are pentacyclic triterpenoids, such as dandelion sainol, β-geraniol, palmitate β-geraniol, 3β-erythritol, and friedelin and friedelinol6. To our knowledge, no one has documented the structural properties of C. halicacabum’s plastid genome, functional categorization, codon preference analysis, structural changes in the chloroplast genome, or family relationships., which might be the foundation for identifying essential genes and obtaining high-quality germplasm resources7.Fig. 1C. halicacabum in its natural habitat (a: flowers and fruits, b: fruits, c:flowers and fruits, d: specimen), the photograph was shot by Wei Wei, the author of this post.Full size imageThe chloroplast is an important self-replicating organelle involved in photosynthesis and complementing nuclear change8. The chloroplast genome is inherited matrilineally and contains a variety of structures, the majority of which are circular tetrads with a large single copy region (LSC), a short single copy region (SSC), and two reverse repeat regions (IRA and IRB)9. The chloroplast genome (cp.) DNA has been extensively employed in a variety of biological fields, including taxonomy revision, systematic evolution, and species identification10,11.The Illumina HiSeq 2000 platform was utilized in this study to obtain the whole genome sequence information of C. halicacabum. The information was then assembled, annotated, analyzed, and compared with other species, revealing the evolutionary relationship between C. halicacabum and its related species. The comprehensive analysis of the whole chloroplast genome sequence provided a theoretical foundation for species identification, functional genomics research, and the continued development and uses of species resources.Results and discussionsBasic characteristics of the Cardiospermum halicacabum Chloroplast genomeIllumina sequencing produced clean data for the cp. genome of C. halicacabum, with 32,238,992 reads spanning 4,806,964,427 bp. After optimizing the raw data, Q20 and Q30 had effective ratios of 96.69% and 91.13%, respectively. It is proved that the sequencing quality is good and the data is reliable. The chloroplast genome of C. halicacabum is composed of an IR of 28,351 bp, an SSC region of 17,991 bp, and an LSC region of 84,677 bp, giving it a typical quadricyclic structure with a total sequence length of 159,370 bp (Table 1; Fig. 2). The total guanine-cytosine (GC) content is 37.91%, with 36.11% in the LSC region, 32.09% in the SSC region, and 42.46% in the IR region. The GC concentration in the IR region is clearly higher than in the two single-copy portions.Table 1 Structure and composition of C. halicacabum Chloroplast genome.Full size tableGene classificationThe chloroplast genome of C. halicacabum has 134 genes, including 89 protein-coding genes, 37 transfer RNA (tRNA) genes, and 8 ribosomal RNA (rRNA) genes (Fig. 2). 89 protein-coding genes can be categorized into four groups based on their biological functions: (1) 44 genes associated with photosynthesis; (2) 32 genes associated with self-replication; (3) 9 other coding genes; (4) 4 genes unknown to function (Table 2). One exon was present in the majority of the protein-coding genes in the chloroplast genome of C. halicacabum, with trnK-UUU having the biggest intron of 2,521 bp. 20 genes in total (trnK-UUU, trnG-GCC, trnL-UAA, trnV-UAC, trnI-GAU×2, trnA-UGC×2, rps16, atpF, rpoC1, ycf3, petB, petD, rpl16, rpl2 × 2, ndhB×2, ndhA) contained one intron, and one gene ( ycf3) incorporated two introns (Table 3). This could be the case because the additional intron may help with photosynthesis and ycf3 needs to build up in the photosystem II. Based on the distribution location of genes in Table 3 and 11 genes are distributed in the LSC region, 1 gene is distributed in the SSC region, and 8 genes are distributed in the IRA and IRB regions, which is consistent with the features of the chloroplast genome seen in most kinds of plants12.Fig. 2A genetic map of the chloroplast genome of C. halicacabum. The transcription of genes inside the circle is moving in a clockwise direction, whereas genes outside the circle are going in the opposite direction. Different gene-coding colors correspond to various functional groups. The grey-black region of the inner circle indicates the amount that of GC, while the light gray region indicates the amount of adenine-thymine (AT). The regions of IRa and IRb, as well as SSC and LSC, are shown in the inner circle.Full size imageTable 2 A list of genes found in the plastid genome of C. halicacabum.Full size tableCodon usage bias analysisHow codons are employed has a substantial impact on genome evolution.Mutational bias is one of the multiple factors that can determine codon usage, and it plays a particularly important role in determining the evolution of the plastome13. The relative frequency of usage (RSCU) of synonymous codons in the chloroplast genome is an important evolutionary factor. RSCU is defined as the ratio of a codon’s actual frequency of usage to its predicted frequency in the absence of preference when encoding a certain amino acid. RSCU = 1 shows that there is no preference for this codon. RSCU > 1 implies high codon use frequency, whereas RSCU 1, with codon AGA (1.87) having the greatest and codon CGC (0.50) having the lowest, and there are 25 codons that end in base A or U(T), and 7 codons ending in base C or G. The results showed that C. halicacabum’s chloroplast genome codons tended to finish with A or U(T) rather than G or C, as shown in Albizia julibrissin17 and Plantago asiatica12(Supplementary Table 1, Fig. 3).Table 3 The length of introns and exons in genes with introns in C. Halicacabum.Full size tableFig. 3Relative synonymous codon usage of C. halicacabum chloroplast genome.Full size imageAnalysis of long repeats and SSRRepetitive sequences may increase genomic rearrangement and variation and can be utilized as genetic markers in population studies18. Supplementary Table 2 indicates that the chloroplast genome of C. halicacabum has 28 long repeats (30 ~ 54 bp). Positive repetition accounts up 46.43% of them, whereas palindrome repetition accounts up 53.57%, with the largest percentage of 67% being distributed to the LSC region. There was no long repeat located in the SSC region, while 28% repeat sequences in IRa and IRb regions. In the chloroplast genome of C. halicacabum, there are 58 SSR markers, comprising 41 single nucleotide repeats, 3 trinucleotide repeats, 4 tetranucleotide repeats, 7 complex nucleotide repeats, and 3 pentanucleotide repeats, as indicated in Supplementary Table 3. SSR is mostly dispersed in the LSC region, as evidenced by the 41 SSR found there (70.70%), 11 SSR located in the SSC region (18.96%), and 6 SSR placed in the IR region (10.34%). Furthermore, A/T single base repeats make up the majority of the SSR sequence of the chloroplast genome of C. halicacabum, and it strongly favors A and T bases. The highly repeatable and polymorphic SSR of the plant chloroplast genome is typically utilized for molecular markers of genetic variation and species identification19. As a result, the SSR in the C. halicacabum Cp chloroplast genome were AT-rich, consistent with the overall AT abundance of the chloroplast sequence. This discovery has been observed in other chloroplast genomes, with hypothesis that it is owing to the ease with which A-T transformation occurs compared to G-C20.Comparative analysis of the Chloroplast genomesIn mVISTA (Fig. 4), we displayed the total number of Cp genome identifiers. In comparison to other species, the Sapindaceae plastid genomes have modest sequence divergence. It is worth mentioning that in chloroplasts of species from the same family, Sapindaceae, as well as other families, substitution rates in LSC and SSC areas were somewhat greater than in IR regions21. The phenomenon has been seen in many plants, and it may be due to copy correction via gene conversion or the existence of conserved rRNA genes in the IR area22. As expected, non-coding regions had higher sequence divergence than coding regions23. The coding regions of the ycf1 genes differed significantly among Sapindaceae species. It is proposed that these conserved coding genes in chloroplasts may be utilized to trace evolutionary connections among a wide range of eudicot plants24including C.halicacabum.Contraction and extension of the IR region limits are recognized as recurring evolutionary processes that contribute to the observed variation in chloroplast genome size25. The findings of a comparison of IR-LSC and IR-SSC borders in the chloroplast genomes of eleven Sapindaceae species revealed that there were four boundaries in the chloroplast genome of Sapindaceae plants, and the genes at the boundaries varied in length (Fig. 5). The chloroplasts of C.halicacabum and the remaining ten Sapindaceae plants included rpl16 genes on the lower side of the whole genome’s LSC/IRb boundary (JLB), rps3 genes on the upper right side, and the flanking genes of the LSC/IRa boundary (JLA) were similar. Except for C.halicacabum, the ndhF genes in the chloroplasts of the other ten plants are positioned on the bottom side of the entire genome SSC-IRb(JSB). The ycf1 gene from the chloroplast of C.halicacabum plants passed below the JSB barrier and was amplified by 4072 bp. S. mukorossi, S. rarak, S. delavayi, A. viridis, A. litoralis, and S. erecta all had ycf1 genes that crossed the JSB barrier and were amplified to some level. The ycf1 gene in the chloroplasts of C.halicacabum and Litchi chinensis crossed the JSA border and amplified by 1 and 2516 bp, respectively. The ycf1 gene in D. longan, S. mukorossi, P. pinnata, N. lappaceum, S. rarak, S. delavayi, A. viridis, A. litoralis and S. erecta crossed the JSA border, with amplification degrees of 4182 bp, 4681 bp, 4251 bp, 4052 bp, 4685 bp, 4685 bp, 4210 bp, 4255 bp, and 4130 bp, respectively. The chloroplast genome IR region is a common area in most higher plants, and its shrinkage and extension is a typical evolutionary event that is regarded to be one of the primary drivers of chloroplast genome size variation26. The growth of the IR region affects the copy number of associated genes. Because of the area’s reverse repetition, whole or partial gene fragments occur in the IR region on the opposite side. In this article, we examined the chloroplast genome boundaries of eleven Sapindaceae species and discovered that the JSB and JSA borders altered significantly, whilst the JLB and JLA boundaries remained rather stable. D. longan, L. chinensis, P. pinnata, and N. lappaceum lack the ycf1 gene, whereas the gene placed at the JSB boundary is a pseudogene. It has been documented in the literature that changes in selection pressure on the ycf1 gene result in different evolutionary rates27,28. The ndhF gene was found solely in the SSC area, but the rps3 gene was found in the IR region. The expansion of the ycf1 gene to the SSC region was obvious in C. halicacabum, and the boundary expansion and contraction phenomenon existed as in the chloroplast gene boundary analysis of other plants in the same Family, and the chloroplast genome boundary analysis of the species S. erecta29 and Acer truncatum30which had already been investigated, had the same result of the change in the boundary of the chloroplast genome did not show regularity.Fig. 4Global alignment analysis of the eleven Sapindaceae chloroplast genomes.Full size imageFig. 5Changes of IR/SC boundary of chloroplast genomes of eleven Sapindaceae species.Full size imageIn this study, the selection pressure of 18 chloroplast genomes was assessed by creating a supercluster of 79 protein coding sequences and computing Ka/Ks. Most Ka/Ks ratios range between 0.1 and 0.3 (Fig. 6), indicating that these chloroplast genomes have been purified. Genes may be split into two groups based on their grouping among 18 species. Class I includes the genes psaJ, rpl23, psaI, clpP1, and ycf2, whereas class II includes all other genes. It is obvious that the Ka/Ks values of class I genes are all larger than one, indicating that these genes are under positive selection throughout evolution, and that these genes may play a significant part in the differentiation process of C. halicacabum and related species. The rpl23 gene has a Ka/Ks of 1.09922 in C. halicacabum and D. toxocarpa (OM743243). However, C. halicacabum and D. viscosa (NC036099), C. halicacabum and H. Bodinieri (NC 054241), and C. halicacabum and H. cupanioides (NC063684) have Ka/Ks of 2.34791, 1.80876, and 1.82885, respectively. The results showed that the gene had little effect on the differentiation of the species of C. halicacabum and D. toxocarpa (OM743243), but it played an important role in promoting the differentiation of C. halicacabum and D. viscosa (NC036099), C. halicacabum and H. bodinieri (NC 054241), and C. halicacabum and H. cupanioides (NC063684).Fig. 6Analysis of species selection pressure among related genera of C. halicacabum.Full size imagePhylogenetic relationship analysisIn tracing plant species lineages and identifying species, phylogenetic relationship analysis using the chloroplast genome is more convenient than using the nuclear genome24. To evaluate C. halicacabum’s phylogenetic position, BI (bayesian inference) and ML (maximum likelihood) phylogenetic trees were generated for 11 chloroplast genome sequences of 11 Sapindaceae species in this work (Fig. 7). The results reveal that the phylogenetic trees created using the two approaches have comparable topologies and strong support, with just minor differences at some nodes. Sapindus has three main branches in the ML tree. S. mukorossi and S. rarak have 86% node support in branch II. S. mukorossi and S. rarak have 95% node support in branch II of the BI tree. The findings revealed that each genus is concentrated into a single branch. S. erecta, A. viridis, and C. halicacabum formed a sister relationship, with C. halicacabum being more closely related to S. erecta. S. erecta grouped in one branch with C. halicacabum and A. viridis, with 100% support; nevertheless, there is currently no research reporting the genetic connection of these species. The ML and BI phylogenetic trees in this construction have comparable topologies, and species from different genera are aggregated on the same branch with strong support. The phylogenetic tree demonstrates that the Cardiospermum genus is closely connected to the Allophylus genus and the Serjania genus, demonstrating that evolution is complicated and changeable, and is consistent with present taxonomy that combines the two families31. Even when sequence studies based on chloroplast genomes reveal different evolutionary histories, taxonomists use a range of parameters when identifying taxonomic units in order to create the most accurate and complete classification system feasible.Fig. 7Phylogenetic analysis based on chloroplast genome sequences.Full size imageMaterials and methodsPlant materialsFresh Cardiospermum halicacabum leaves were obtained at the Guangxi University of Chinese Medicine campus (N22°48′14; E 108°30′4). The specimens are kept at the University Herbarium, and the freshly collected leaves are cleaned and stored on dry ice for future use. Cardiospermum halicacabum specimens were recognized and sorted by Professor Haicheng Wen of Guangxi University of Chinese Medicine. The collected specimens are housed in the Botanical Herbarium at Guangxi University of Chinese Medicine (voucher number 202402DDL001, Haicheng Wen, wenhaicheng2015@qq.com). No ethical approval or authorization is required for this study. Because the research material is a common plant that might be used as a vegetable. It is not an endangered or protected plant, nor is it harvested on protected land. Furthermore, there are no local or national regulations in China governing the collecting of this plant. The sample was legally collected in accordance with the authors’ institution’s regulations and applicable national or international legislation. Field research complies with local legislation.DNA extraction and sequencingThe total genome DNA was extracted from C. halicacabum using a modified CTAB method32. Detecting the sample; Using qualified samples to construct a library, first, a large fragment of DNA was interrupted with a size of about 500 bp, then the sticky end formed by the interruption was repaired to a flat end, and then the base “A” was added to the 3’ end, so that the DNA fragment could be connected with a linker with a “T” base at the 3’ end, and the target fragment was recovered by electrophoresis, and then the DNA fragment with connectors at both ends was Finally, cluster preparation was done with the qualified library, and sequencing was done on Illumina HiSeq 2000 platform.Assembly and annotation of Chloroplast genomeThe final genome was created by enlarging the initial fragment, which was then assembled using NOVOPlasty (version 2.7.2) software33 from mitochondria, chloroplast genome, or a portion of a similar species. The annotated sequence was subsequently annotated with GeSeq (version 1.78) software and submitted to the NCBI database online, where it was assigned the GenBank accession number OR387683 after review. We used the OGDRAW tool (https://chlorobox.mpimp-golm.mpg.de/OGDraw.html)34 to create a circular Cp genome map.Codon usage analysisThe Codon W (https://galaxy.pasteur.fr/?form=codonw) online software was used to assess codon number, codon use frequency, and relative usage of synonymous codons (RSCU) in the chloroplast genome’s protein coding region. Each CDS has an initiation codon and a termination codon. RSCU is a qualifying frequency that corresponds to each codon particular to encoding an amino acid.Analysis of long repeats and SSRThe forward, reverse, palindrome, and complement sequences among the four repetitive segments of the Cp genome were visualized using the web-based REPuter tool35 (https://bibiserv.cebitec. uni-bielefeld.de/reputer). For all types of repeat, REPuter’s restrictions assisted in identifying all 90% identical repeat sequences with a minimum repetition size of 30 bp and a hamming distance of 3. This means that the maximum length of the gap between repeats was 3 bp. All overlapping repeats were removed from the final results. SSR were defined as repeating sequence sections of 1 to 5 bp that were repeated at least three times. Large repeat sequences were defined as those with a length of at least 20 bp36. Using 10 repeat units for mononucleotide SSR, 5 repeat units for dinucleotide SSR, 4 repeat units for trinucleotide SSR, and 3 repeat units for tetra-, penta-, and hexanucleotide SSR, respectively, the MISA37 search parameters were used to detect the Cp SSR.Comparison and analysis among genomesComparative genomics analysis includes C. halicacabum (OR387683) and S. erecta (OQ247884), as well as representative species of Sapindaceae, D. longan (MG214255), S. mukorossi (NC025554), L. chinensis (NC035238), N. lappaceum (NC053699), S. rarak (NC067869), S. delavayi (OL840057), A. viridis (ON098014), A. littoralis (ON098015), and P. pinnata (NC048999). The complete genomes of eleven species were compared by mVISTA39. Irscope38 (https://IRscope.shinyapps.io/irapp/) was also used to detect and visualize the contraction and expansion of IR boundary between four major regions (LSC/IRb/SSC/IRa) of eleven chloroplast genome sequences. In addition, Cardiospermum halicacabum’s selection pressure was determined using the ratio of non-synonymous substitutions (Ka) to synonymous substitutions (Ks) (Ka/Ks)40. Initially, eighteen chloroplast genomes from C. halicacabum-related species were examined, and 79 protein-coding genes were identified. The non-synonymous substitution rate (Ka) and synonymous substitution rate (Ks) for each gene were calculated, as well as the ratio of the two (Ka/Ks), with positive selection denoted by a Ka/Ks ratio greater than 1, neutral selection by a Ka/Ks ratio of 1, and purifying selection by a Ka/Ks ratio less than 1.Phylogenetic analysisA phylogenetic analysis was carried out using chloroplast genomes from 21 species, including the one C. halicacabum sequenced and assembled in this work, as well as 20 others obtained from GenBank. The search for homologous single copy genes was conducted using OrthoFinder v2.3.14, which yielded 62 results. MAFFT v7.429 was used to compare 62 protein coding genes from 21 samples, and Gblocks 0.91b was used to trim the fuzzy areas. Finally, they were joined in series to form a phylogenetic tree. Using IQ-TREE v1.6.1 software41 to construct a maximum likelihood (ML) phylogenetic tree (bootstraps:1000) and MrBayes 3.2.765 based on PhyloSuite v1.2.366 software42 to find the best model GTR + F + I + G4. The sequence of other closely related and outer related species used in the analysis were downloaded from NCBI with the following accession numbers (Table 4).Table 4 Information about species’ accession numbers.Full size tableConclusionThe plastome of C. halicacabum has been sequenced and annotated for the first time. In this study, we generated the C. halicacabum chloroplast genome, which has 89 protein-coding genes, 37 tRNA, and 8 rRNA in a typical tetrameric structure totaling 159,370 bp in length. Leu contains the most codons, and the greatest RSCU value is found in codon AGA (1.84). Particularly, we found 58 SSR in C. halicacabum that may be used with DNA barcoding to distinguish between similar species. Most chloroplast genomes, including those of the closest species, shared an overall structure and gene content similar to that of the C. halicacabum chloroplast genome. The phylogenetic analysis revealed that C. halicacabum is a sister to S. erecta. Furthermore, Cardiospermum genus was more closely related to Serjania genus and Allophylus genus in Sapindaceae than other genus in Sapindaceae. Our genome analysis found highly conserved gene sequences as well as many changes between Sapindaceae species, allowing us to further investigate these evolutionary links. Thus, these conserved gene sequences in the chloroplast genome may be utilized to identify Sapindaceae species. Our findings not only reveal fresh insights into C. halicacabum’s chloroplast genomic and phylogenetic links, but will also help to maximize the plant’s future development and applications.Data availabilityGenomic sequence data supporting the findings of this study are publicly available at the NCBI GenBank at https://www.nibi.nlm.nih.gov under accession number OR387683. The associated Bioproject, BioSample, and SRA numbers are PRJNA1000035, SAMN36753195, and SRR25455267, respectively.ReferencesJiangsu New Medical College. Dictionary of Traditional Chinese Medicine3862 (Shanghai People’s Publishing House, 1986).Edited by the Editorial Committee of Chinese Flora of China Academy of Sciences. Flora of China, Volume 47, Volume 1. Beijing: Science Press; (2016).Food and Drug Administration of Guangxi Zhuang Autonomous Region. Quality Standard of Zhuang Medicine in Guangxi Zhuang Autonomous Region, Volume 2, Qiangren Bilingual 2011 Edition (Guangxi Science and Technology, 2019).Cheng, H. L. et al. Antiinflammatory and antioxidant flavonoids and phenols from Cardiospermum halicacabum (Dào Dì Líng). J. Tradit Complement. Med. 3 (1), 33–40 (2013).Article PubMed PubMed Central Google Scholar Jeyadevi, R., Sivasudha, T., Rameshkumar, A. & Dinesh Kumar, L. Anti-arthritic activity of the Indian leafy vegetable Cardiospermum halicacabum in Wistar rats and UPLC-QTOF-MS/MS identification of the putative active phenolic components. Inflamm. Res. 62 (1), 115–126 (2013).Article CAS PubMed Google Scholar Wei, J. H., Chen, J., Cai, S. F., Lu, R. M. & Lin, S. W. Studies on chemical constituents of Cardiospermum halicacabum (I). Chin. Herb. Med. 42 (08), 1509–1511 (2011).CAS Google Scholar Song, Y. et al. Development of Chloroplast genomic resources for Oryza species discrimination. Front. Plant. Sci. 8, 1854 (2017).Article PubMed PubMed Central Google Scholar Nock, C. J. et al. Chloroplast genome sequences from total DNA for plant identification. Plant. Biotechnol. J. 9 (3), 328–333 (2011).Article CAS PubMed Google Scholar Du, Z. et al. The Chloroplast genome of Amygdalus L. (Rosaceae) reveals the phylogenetic relationship and divergence time. BMC Genom. 22 (1), 645 (2021).Article CAS Google Scholar Henriquez, C. L. et al. Evolutionary dynamics of Chloroplast genomes in subfamily aroideae (Araceae). Genomics 112 (3), 2349–2360 (2020).Article CAS PubMed Google Scholar Chen, Q., Wu, X. & Zhang, D. Phylogenetic analysis of Fritillaria cirrhosa D. Don and its closely related species based on complete Chloroplast genomes. PeerJ 7, e7480 (2019).Article PubMed PubMed Central Google Scholar Wu, J. et al. Comprehensive analysis of complete Chloroplast genome sequence of Plantago asiatica L. Plant. Signal. Behav. 18 (1), 2163345 (2023). Plantaginaceae.Article PubMed PubMed Central Google Scholar Liu, Q. & Xue, Q. Comparative studies on codon usage pattern of chloroplasts and their host nuclear genes in four plant species. J. Genet. 84 (1), 55–62 (2005).Article CAS PubMed Google Scholar Fuglsang, A. The ‘effective number of codons’ revisited. Biochem. Biophys. Res. Commun. 317 (3), 957–964 (2004).Article CAS PubMed Google Scholar Jiang, Y., Deng, F., Wang, H. & Hu, Z. An extensive analysis on the global codon usage pattern of baculoviruses. Arch. Virol. 153 (12), 2273–2282 (2008).Article CAS PubMed Google Scholar Morton, B. R. The role of context-dependent mutations in generating compositional and codon usage bias in grass Chloroplast DNA. J. Mol. Evol. 56 (5), 616–629 (2003).Article ADS CAS PubMed Google Scholar Zhang, J. et al. Comprehensive analysis of Chloroplast genome of Albizia julibrissin durazz. (Leguminosae sp). Planta 255 (1), 26 (2021).Article PubMed Google Scholar Park, I., Yang, S., Choi, G., Kim, W. J. & Moon, B. C. The complete Chloroplast genome sequences of Aconitum pseudolaeve and Aconitum longecassidatum, and development of molecular markers for distinguishing species in the Aconitum subgenus lycoctonum. Molecules 22 (11), 2012 (2017).Article PubMed PubMed Central Google Scholar Doorduin, L. et al. The complete Chloroplast genome of 17 individuals of pest species Jacobaea vulgaris: snps, microsatellites and barcoding markers for population and phylogenetic studies. DNA Res. 18 (2), 93–105 (2011).Article CAS PubMed PubMed Central Google Scholar Xie, D. F. et al. Comparative analysis of the Chloroplast genomes of the Chinese endemic genus Urophysa and their contribution to Chloroplast phylogeny and adaptive evolution. Int. J. Mol. Sci. 19 (7), 1847 (2018).Article PubMed PubMed Central Google Scholar Zhu, A., Guo, W., Gupta, S., Fan, W. & Mower, J. P. Evolutionary dynamics of the plastid inverted repeat: the effects of expansion, contraction, and loss on substitution rates. New. Phytol. 209 (4), 1747–1756 (2016).Article CAS PubMed Google Scholar Khakhlova, O. & Bock, R. Elimination of deleterious mutations in plastid genomes by gene conversion. Plant. J. 46 (1), 85–94 (2006).Article CAS PubMed Google Scholar Yao, X. et al. The first complete Chloroplast genome sequences in actinidiaceae: genome structure and comparative analysis. PLoS One. 10 (6), e0129347 (2015).Article PubMed PubMed Central Google Scholar Provan, J., Powell, W. & Hollingsworth, P. M. Chloroplast microsatellites: new tools for studies in plant ecology and evolution. Trends Ecol. Evol. 16 (3), 142–147 (2001).Article CAS PubMed Google Scholar Goulding, S. E., Olmstead, R. G., Morden, C. W. & Wolfe, K. H. Ebb and flow of the Chloroplast inverted repeat. Mol. Gen. Genet. 252 (1–2), 195–206 (1996).Article CAS PubMed Google Scholar Wang, R. J. et al. Dynamics and evolution of the inverted repeat-large single copy junctions in the Chloroplast genomes of monocots. BMC Evol. Biol. 8, 36 (2008).Article CAS PubMed PubMed Central Google Scholar Yang, X. F. et al. PBR1 selectively controls biogenesis of photosynthetic complexes by modulating translation of the large Chloroplast gene Ycf1 in Arabidopsis. Cell. Discov. 2, 16003 (2016).Article CAS PubMed PubMed Central Google Scholar Vitti, J. J., Grossman, S. R. & Sabeti, P. C. Detecting natural selection in genomic data. Annu. Rev. Genet. 47, 97–120 (2013).Article CAS PubMed Google Scholar Corvalán, L. C. J. et al. Chloroplast genome assembly of Serjania erecta raldk: comparative analysis reveals gene number variation and selection in protein-coding plastid genes of Sapindaceae. Front. Plant. Sci. 14, 1258794 (2023).Article PubMed PubMed Central Google Scholar Ma, Q. et al. Characterization of the complete Chloroplast genome of Acer truncatum bunge (Sapindales: Aceraceae): A New Woody Oil Tree Species Producing Nervonic Acid. Biomed. Res. Int. 2019, 7417239 (2019).Article PubMed PubMed Central Google Scholar The Angiosperm Phylogeny Group et al. An update of the angiosperm phylogeny group classification for the orders and families of flowering plants: APG IV. Bot. J. Linn. Soc. 181 (1), 1–20 (2016).Article Google Scholar Doyle, J. DNA protocols for plants: CTAB total DNA isolation. In Molecular Techniques in Taxonomy (eds (eds Hewitt, G. M. & Johnston, A.) (Springer, (1991).Dierckxsens, N., Mardulyn, P. & Smits, G. NOVOPlasty: de Novo assembly of organelle genomes from whole genome data. Nucleic Acids Res. 45 (4), e18 (2017).PubMed Google Scholar Greiner, S., Lehwark, P. & Bock, R. OrganellarGenomeDRAW (OGDRAW) version 1.3.1: expanded toolkit for the graphical visualization of organellar genomes. Nucleic Acids Res. 47 (W1), W59–W64 (2019).Article CAS PubMed PubMed Central Google Scholar Kurtz, S. et al. REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res. 29 (22), 4633–4642 (2001).Article MathSciNet CAS PubMed PubMed Central Google Scholar Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27 (2), 573–580 (1999).Article CAS PubMed PubMed Central Google Scholar Beier, S., Thiel, T., Münch, T., Scholz, U. & Mascher, M. MISA-web: a web server for microsatellite prediction. Bioinformatics 33 (16), 2583–2585 (2017).Article CAS PubMed PubMed Central Google Scholar Amiryousefi, A., Hyvönen, J. & Poczai, P. IRscope: an online program to visualize the junction sites of Chloroplast genomes. Bioinformatics 34 (17), 3030–3031 (2018).Article CAS PubMed Google Scholar Frazer, K. A., Pachter, L., Poliakov, A., Rubin, E. M. & Dubchak, I. VISTA: computational tools for comparative genomics. Nucleic Acids Res. ; (2004). 32(Web Server issue):W273–W279 .Wang, D., Zhang, Y., Zhang, Z., Zhu, J. & Yu, J. KaKs_Calculator 2.0: a toolkit incorporating gamma-series methods and sliding window strategies. Genomics Proteom. Bioinf. 8 (1), 77–80 (2010).Article CAS Google Scholar Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32 (1), 268–274 (2015).Article CAS PubMed Google Scholar Zhang, D. et al. PhyloSuite: an integrated and scalable desktop platform for streamlined molecular sequence data management and evolutionary phylogenetics studies. Mol. Ecol. Resour. 20 (1), 348–355 (2020).Article PubMed Google Scholar Download referencesAcknowledgementsWe would like to thank Genewiz Biotechnology (Suzhou) Co. Ltd in China for chloroplast genome sequencing and bioinformatics analysis.Author informationAuthor notesYongjing Su, Wei Wei and Lintong Han: These authros contributed equally.Authors and AffiliationsGuangxi University of Chinese Medicine, 13th Wuhe Avenue, Qingxiu District, Nanning, 530200, Guangxi, ChinaYongjing Su, Wei Wei, Lintong Han, Haicheng Wen & Hailin LuDepartment of Pharmacy HIV/AIDS Clinical Treatment Center of Guangxi (Nanning), The Fourth People’s Hospital of Nanning, Nanning, 530023, ChinaYongjing SuAuthorsYongjing SuView author publicationsSearch author on:PubMed Google ScholarWei WeiView author publicationsSearch author on:PubMed Google ScholarLintong HanView author publicationsSearch author on:PubMed Google ScholarHaicheng WenView author publicationsSearch author on:PubMed Google ScholarHailin LuView author publicationsSearch author on:PubMed Google ScholarContributionsConceived, designed the study: HCW and HLL; collected specimens and prepared samples for sequencing: YJS, WW and LTH, Analysis and interpretation of the data: YJS; Drafted the manuscript: YJS; Revised and criticized the manuscript: HCW and HLL; All authors approved the final version and agreed to accountable for all aspects of the work.Corresponding authorsCorrespondence to Haicheng Wen or Hailin Lu.Ethics declarationsCompeting interestsThe authors declare no competing interests.Ethics declarationsNo ethical approval/permission is required in this study.Additional informationPublisher’s noteSpringer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Electronic supplementary materialBelow is the link to the electronic supplementary material.Supplementary Material 1Rights and permissionsOpen Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.Reprints and permissionsAbout this article