Background & SummaryEcotoxicogenomics is an emerging field that integrates genomics-based tools (such as genomics, transcriptomics, and proteomics) to investigate the molecular effects of environmental stressors, with applications ranging from model species to non-target and non-model organisms1. It provides valuable insights into the molecular mechanisms underlying responses to pollutants, offering potential biomarkers for monitoring the ecological status of aquatic environments. Invertebrates, particularly crustaceans, play a pivotal role in aquatic ecosystems due to their positions in food webs and their sensitivity to natural and chemical stressors2. Gammarids have long been used as model organisms in ecotoxicology due to their ecological importance, wide distribution, and sensitivity to pollutants. The common freshwater amphipod, Gammarus pulex, has gained considerable attention in ecotoxicology studies as a non-target and non-model organism due to its wide distribution3, ecological relevance4,5, and susceptibility to emerging chemicals2,6,7. Despite this, genomic resources as transcriptomes or genomes for Gammarus pulex (and gammarids in general), remain limited, hindering deeper insights into their molecular responses to environmental stress. Existing transcriptome assemblies8,9 are based on restricted sample sizes or specific tissues, and report lower completeness and annotation quality. In contrast, the present study provides a multi-population transcriptome resource for Gammarus pulex that addresses these limitations. Although a fragmented genome exists for Gammarus lacustris10 and a reference genome is available for Hyalella azteca11, no high-quality genome is currently available for Gammarus pulex, emphasizing the need for comprehensive molecular resources for this ecologically important taxon.The development of transcriptomic resources for non-model species like Gammarus pulex is essential to enhance our ability for environmental monitoring. Transcriptome data, which provides a snapshot of gene expression, is particularly valuable for understanding organismal responses to environmental stressors and chemical pollution12. Unlike genomic data, transcriptomics focuses on the expressed portion of the genome, allowing the identification of genes that are actively involved in stress responses13. Moreover, generating high-quality transcriptome resources for species lacking published reference genomes not only facilitates the identification and interpretation of differentially expressed genes (DEGs) but also enables the use of streamlined workflows that do not rely on replicating a computationally demanding de novo assembly. Transcriptome assembly and annotation provide a foundation for gene expression analysis, enabling the identification of DEGs linked to environmental stress. These DEGs can act as molecular biomarkers of exposure and effects, even revealing the underlying molecular mechanisms of ecotoxicity. Furthermore, transcriptomic databases provide complementary information that is especially useful for genome annotation and insights into gene functioning14. As more transcriptome profiles are archived, they will allow for enhanced meta-analytical approaches, improving our ability to elucidate specific functional response pathways and providing crucial insights into baseline gene expression profiles and stress responses.A key challenge in ecotoxicology is linking molecular biomarkers to higher-level ecological effects. The Adverse Outcome Pathway (AOP) framework provides a structured approach to bridging this gap by connecting molecular initiating events (MIEs) to adverse outcomes pathways (AOPs) through key events (KEs) and key event relationships (KERs)15. By integrating transcriptomic data within this framework, researchers can establish mechanistic links between molecular disruptions and organismal or population-level effects, offering a predictive and regulatory-relevant perspective on ecotoxicological impacts16. The integration of AOPs with transcriptomic data is particularly valuable in non-model organisms like Gammarus pulex, whose ecological significance and sensitivity make it an ideal species for investigating molecular responses to stressors. For instance, transcriptomic insights can help identify MIEs related to endocrine disruption, immune suppression, or metabolic dysfunction, which, through a cascade of key events, could lead to population declines or ecosystem instability. Furthermore, identifying stress-responsive genes involved in detoxification pathways or oxidative stress responses can support biomonitoring efforts by providing early-warning indicators of pollution exposure. Additionally, understanding the genetic basis of pollutant tolerance may help identify populations with higher adaptive potential, informing conservation and restoration strategies.Gammarus pulex plays a crucial ecological role as a detritivore, contributing to the breakdown of organic matter in freshwater ecosystems4,5. Its sensitivity to pollutants, including metals17, pesticides18, pharmaceuticals19, and complex mixtures of micropollutants2,6,20, makes it an excellent sentinel species for assessing the health of aquatic ecosystems. Wastewater treatment plants (WWTPs) have been shown to negatively impact Gammarus pulex populations, causing decreased feeding rates20, increased mutation rates6, and oxidative stress21. These studies have primarily focused on behavioural and physiological endpoints (e.g., feeding rates, reproduction, and survival). However, such phenotypic observations offer limited insight into the underlying molecular mechanisms of stress responses. So far, molecular and physiological analyses in Gammarus pulex have been hampered by a lack of omics data8,22. Although widely used in environmental monitoring, Gammarus pulex still lacks comprehensive molecular resources such as high-quality transcriptomes, limiting mechanistic ecotoxicological studies compared to established model invertebrates like Drosophila melanogaster or Caenorhabditis elegans.To address the current gap and enhance the application of transcriptomics in ecotoxicogenomics, this study focuses on generating three de novo transcriptome assemblies for Gammarus pulex using RNA sequencing (RNA-Seq). Gammarus pulex were collected in streams and rivers characterised by complex mixtures of micropollutants including pesticides, pharmaceuticals, and industrial chemicals and areas with low micropollution pressure in Sweden23,24 and Germany25,26,27. The objectives are threefold: (1) to assemble high-quality transcriptomes for each of the sampling areas, and a joint assembly using the entire dataset, (2) to annotate the assembled transcripts with functional information, and (3) to assess the completeness and quality of the transcriptome assemblies and annotation. These resources are expected to not only advance molecular research on Gammarus pulex but also provide actionable data for integrating transcriptomic insights into biomonitoring programs and AOP-based regulatory frameworks. By providing comprehensive transcriptome resources for Gammarus pulex, this work will support future ecotoxicogenomics research, enabling the identification of molecular markers for environmental monitoring and a better understanding of the species’ molecular response to environmental stressors.MethodsEthic statementNo specific permits were required to collect Gammarus pulex from the different sampling sites. They are not on any list of endangered or protected species.Study area and sample collectionThirteen sampling sites, located in streams and small rivers across Southern Sweden and Central Germany, were selected for Gammarus pulex collection (Fig. 1). These sites represented distinct land-use patterns and micropollutant profiles, facilitating a comprehensive analysis of gene expression responses. In Southern Sweden, characterised by intensive agriculture and the Malmö-Lund metropolitan area, two sites (HOJ and BAM) within natural reserves served as low-anthropogenic-pressure references. Three sites (M42, SKI, and SAX) were situated in agricultural regions, while two (SNT and LOM), located upstream and downstream of Lund’s wastewater treatment plant (WWTP), reflected urban influences. Similarly, in the River Holtemme (Central Germany), one site (H1), upstream of Silstedt’s WWTP, represented low anthropogenic pressure, two (H3 and H6) were in agricultural areas, and three (H2, H4, and H5), including two downstream of WWTPs, captured urban impacts. Previous micropollutant characterization and risk assessment studies in both regions have highlighted potential risks to aquatic invertebrates28,29.Fig. 1Maps showing the location of the sampling sites. (A) Sweden and (B) Germany (B). Land uses are also summarised in the legend.Full size imageGammarus pulex specimens were sampled following a standardised sampling protocol30 in the summer of 2021 (Germany) and 2022 (Sweden). Twenty habitat-weighted samples were taken from a total area of 1 m2 at each site with a Surber sampler (500 μm mesh size). After collection, specimens were flash-frozen on dry ice or a cryoshipper (Voyageur-12, Air Liquide) and stored at −80 °C until RNA isolation, eliminating the need for RNAlater. In total, RNA was extracted from 65 specimens (5x specimen per sampling site), from 5 rivers/small streams in Skåne region in Sweden and the River Holtemme (Saxony-Anhalt) in Germany. Stream and/or river names and respective geocoordinates are listed in Supplementary Table 1.RNA extraction, library preparation and sequencingSamples were crushed in liquid nitrogen using a mortar and pestle, and total RNA was extracted individually from each specimen using the RNeasy Plus Mini Kit (Qiagen, Germany) according to the manufacturers’ instructions. In addition, genomic DNA was removed using DNAse I (ThermoFisher Scientific, Germany). RNA was eluted in 60 µL RNAse-free water, which was passed twice through the spin column to maximise the concentration of eluted RNA. The quantity and quality of extracted RNA were assessed using the Qubit RNA Assay Kit in a Qubit 2.0 Flurometer (ThermoFisher Scientific) and NanoDrop 2000c (ThermoFisher Scientific), respectively. RNA Integrity Number (RIN) was measured using a 2200 TapeStation (Agilent Technologies, Germany) and only samples with RIN > 7 were sequenced.Library preparation and sequencing were performed by Novogene (Cambridge, United Kingdom). Briefly, messenger RNA (mRNA) was purified from 3 µg total RNA using poly-T oligo-attached magnetic beads (Invitrogen, USA). The RNA strands were subsequently fragmented with divalent cations in NEB First Strand Synthesis reaction buffer (NEB, USA), followed by first- and second-strand cDNA synthesis. The library was ready after end repair, A-tailing, adapter ligation, size selection, amplification, and purification. The library was checked with Qubit 2.0, real-time PCR for quantification, and 2100 BioAnalyzer (Agilent) for size distribution detection. Quantified libraries were pooled and sequenced on Illumina NovaSeq. 6000, generating 150 bp paired-end reads.De novo transcriptome assembly, refinement, and quality assessmentThe bioinformatic workflow for transcriptome assembly and annotation is illustrated in Fig. 2. A total of 30 paired-end libraries were processed for the German transcriptome (DE), 35 for the Swedish transcriptome (SWE), and 65 for the combined DE-SWE transcriptome. Initial quality control of the raw reads was performed using FastQC31 to evaluate sequence quality. Following this, low-quality reads and adapter sequences were removed with Trimmomatic32, using a quality threshold of phred + 33 and ensuring a minimum base quality score of 25. Reads shorter than 50 bp were also filtered out to improve downstream analysis accuracy.Fig. 2Workflow chart of the bioinformatic pipeline. Starting from the read cleaning of the raw data to the annotation of the de novo transcriptomes.Full size imageThe Trinity v2.9.133 assembler was employed to generate each dataset’s de novo transcriptome assemblies. Trimmed reads were subsequently mapped back to the assembled transcriptome using Salmon34, which was used to estimate transcript abundance. We first used the script TrinityStats.pl contained in TRINITY for assembly quality assessment, followed by BUSCO v5.7.235, which evaluated assembly completeness by comparing the results against the Arthropoda Odb10 datasets (updated January 2024). An expression matrix was generated, and low-expressed transcripts (TPM