Chromosome-level genome assembly of the aphid Megoura crassicauda

Wait 5 sec.

AbstractMegoura crassicauda (Hemiptera: Aphididae) is a major pest species that inflicts significant damage on various legume crops worldwide, causing substantial global economic losses. In this study, we present a chromosome-scale genome assembly of M. crassicauda. By integrating PacBio long-read sequencing, Illumina short-read sequencing, and Hi-C scaffolding techniques, we constructed a genome assembly spanning 424.45 Mb genome. Approximately 93.74% of the assembly was successfully anchored into five scaffolds, with contig and scaffold N50 reaching 5.55 Mb and 103.55 Mb, respectively. The genome completeness, as evaluated by BUSCO, achieved a completeness score of 97.75%. Additionally, a total of 14,717 protein-coding genes were identified. This high-quality genome assembly of M. crassicauda serves as a valuable genomic resource, facilitating further studies into the ecological adaptations of M. crassicauda and the development of pest control strategies.Background & SummaryLegumes represent one of the most vital crop families in agriculture1, serving as the primary source of vegetable protein for both human consumption and livestock feed. However, the productivity of key leguminous crops, such as pea (Pisum sativum), broad bean (Vicia faba), and faba bean, is significantly threatened by pest infestations2. Among these pests, Megoura crassicauda (Hemiptera: Aphididae) is a predominant legume pest, and specifically infests leguminous plants, including pea (P. sativum), broad bean (V. faba), and other Vicia spp.3, using piercing-sucking mouthparts to extract phloem sap. Its feeding behavior causes substantial agricultural damage, primarily through leaf distortion, stunted growth, and yield reduction, with the most severe impacts occurring during reproductive stages (e.g., flowering and pod development). Native to east Asia (east Siberia, China, Japan and Korea)4, M. crassicauda has recently been detected in New South Wales, Australia5, raising concerns about its potential for global spread. However, despite its significant economic impact and invasive potential, genomic resources for M. crassicauda remain limited.Aphid release droplets from their cornicles when attacked6. These droplets contain alarm pheromones that trigger avoidance behaviors in nearby conspecifics, ultimately increasing the survival rate of the aphid population. Although the chemical signal recognition mechanisms of aphid alarm pheromones have been extensively studied, significant knowledge gaps remain regarding the key genes and regulatory mechanisms of their biosynthetic pathways, with interspecies divergence observed in biosynthetic strategies7. Notably, aphids of the genus Megoura produce alarm pheromones composed of diverse terpenoid compounds8, making them an ideal model for investigating the evolution and adaptive divergence of pheromone synthesis pathways. However, the lack of high-quality genomic data for Megoura species has hindered cross-species comparative genomic analyses, severely impeding the elucidation of the molecular basis of pheromone diversity.Since the landmark sequencing of Acyrthosiphon pisum9, genomic resources for aphids have expanded to 27 aphid species (as recorded in the InsectBase v2.010), spanning destructive pests such as Myzus persicae9, Aphis glycines11, Aphis gossypii12. These datasets provide foundational resources for deciphering the molecular basis of aphid adaptation and stress resistance. However, the current genomic coverage of aphid species remains below 1%, which highlights the critical need to accelerate functional genomics research through omics-driven sequencing initiatives.Here, we present a high-quality chromosome-scale genome assembly of M. crassicauda, achieved through the integration of PacBio sequencing, Illumina sequencing, and chromatin conformation capture (Hi-C) techniques. Gene structures were annotated based on a combination of transcriptomic data, ab initio predictions, and homology-based approaches. Species tree was constructed to elucidate the evolutionary relationship of M. crassicauda with other Aphididae species. This genome assembly serves as a valuable resource for advancing molecular biology research and developing pest control strategies for this species.MethodsSample preparation and genomic sequencingThe M. crassicauda colony in this study was originally collected from bean fields at the Langfang Experimental Station of the Chinese Academy of Agricultural Sciences. Aphids were maintained on broad bean (Vicia faba) plants in a greenhouse under ambient light conditions, with temperature controlled at 20 ± 2 °C and relative humidity maintained at 75%. To establish a colony population of parthenogenetic females, a single female was isolated from the parental colony to establish a new colony. Through five consecutive generations of clonal propagation - with one offspring systematically selected from each generation to initiate the subsequent colony - we established a genetically stable, all-female lineage. This fifth-generation clonal colony, exclusively comprising parthenogenetically reproducing females, served as the experimental material for whole-genome sequencing analyses.For PacBio sequencing, genomic DNA was extracted from 40 wingless parthenogenetic adult females. Two single-end libraries with 20-kb insert sizes were prepared using the PacBio’s Single-Molecule Real-Time (SMRT) sequencing technology (Pacific Biosciences). Sequencing was performed on the PacBio Sequel II platform, yielding raw reads from a single cell. After quality control, 133.11 Gb of high-quality SMRT sequences were retained, providing ~314 × coverage with an average read length of 12.35 kb (N50 = 17.48 kb). For Illumina short-read sequencing, DNA was extracted from around about 40 wingless parthenogenetic female adults. A 400-bp paired-end library was constructed following standard Illumina protocols and sequenced on the HiSeq X Ten platform, producing 29.35 Gb of paired-end reads with 150 bp length. To enable chromosome-level assembly, a Hi-C library was prepared using established methods13. Fresh tissue samples from 40 adult individuals were crosslinked with paraformaldehyde to capture interacting DNA segments. The crosslinked material was digested with DpnII restriction enzyme, and biotinylated nucleotides were used to label the restriction fragment ends. The Hi-C library was quantified and sequenced on the Illumina NovaSeq/MGI-2000 platform, generating ~58.39 Gb of paired-end clean reads with 150 bp length.RNA sequencingTotal RNA was extracted from 50 parthenogenetic adult females divided into five biological batches (10 adults per batch) using TRIzol reagent (Invitrogen, Carlsbad, CA, USA)14. RNA extracts from all batches were pooled and resuspended in RNase-free water. RNA integrity was assessed by 1% agarose gel electrophoresis, and purity (A260/A280 ratio ≥ 1.8) and concentration were quantified using a NanoDrop ND-2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). High-quality RNA samples were selected for cDNA library preparation15. Sequencing was performed on the Illumina NovaSeq 6000 platform (Illumina, San Diego, CA, USA) using a 200 bp paired-end strategy. This process yielded a total of 148,198,723 high-quality clean reads, with a Q30 scores exceeding 90%.Genome assemblyQuality control of the raw Illumina reads was performed using FASTP v0.20.016. Clean reads were subsequently analyzed with JELLYFISH v2.3.017 to generate a 17-mer frequency distribution map. Genome size estimation was conducted through k-mer spectrum analysis using Genomescope v1.018, revealing a predicted genome size of 428.28 Mb for M. crassicauda.For contig assembly, PacBio reads were initially error-corrected using FALCON v1.8.7 (reads_cutoff: 1k, seed_cutoff: 33k). The corrected reads were then assembled into a draft genome using SMARTDENOVO v1.0 with parameters -J 3000 and -k 1919. To improve assembly accuracy, PacBio reads were aligned to the draft genome using BLASR v5.120, followed by one round of genome polishing with ARROW v2.2.2 under default parameters. For further refinement, Illumina reads were mapped to the assembly using BWA v0.7.1221, and four iterations of contig polishing were performed with NextPolish v1.0.5 using default parameters22. The final contig-level assembly achieved a total length of 424.45 Mb, closely matching the estimated genome size, and achieved a contig N50 of 5.55 Mb (Table 1).Table 1 Major indicators of the Megoura crassicauda genome.Full size tableHi-C scaffoldingRaw sequencing reads containing low-quality bases (Phred score