Alternate RNA decoding results in stable and abundant proteins in mammals

Wait 5 sec.

Data availabilityThe analyses described in this work use sequencing and proteomic data from previously published datasets deposited in public data repositories. CPTAC raw MS proteomic data were downloaded from the CPTAC Data Portal with the following PDC study IDs: PDC000120 (BRCA), PDC000127 (CCRCC), PDC000234 (LSCC), PDC000153 (LUAD), PDC000270 (PDAC) and PDC000125 (UCEC). CPTAC genomic and transcriptomic sequencing reads were accessed through the GDC Data Portal with the following dbGaP study accessions: phs000892 (CPTAC-2) and phs001287 (CPTAC-3). Whole exome sequencing data from the following PDC studies were analysed: PDC000127, PDC000446, PDC000204, PDC000221, PDC000234, PDC000153, PDC000489, PDC000270, PDC000393, PDC000125, PDC000439, and PDC000464. Access to controlled data was granted after application to NCBI (project no. 24007: Investigation of Mistranslation Rates in Cancer). RNA-seq data for healthy human tissues17 were downloaded from ArrayExpress with the identifier E-MTAB-2836, and corresponding raw MS data were downloaded from PRIDE with project accession PXD010154. Mouse transcriptome sequence data were downloaded from ArrayExpress under the identifier E-MTAB-10276. The mouse MS proteomic data were downloaded from PRIDE with the dataset identifier PXD030983. Primary cell MS data were downloaded from PRIDE with the following accession numbers: PXD008511 (B cells), PXD008512 (hepatocytes), PXD008513 (monocytes) and PXD008515 (natural killer cells). Immunoprecipitation–MS proteomic data were downloaded from MassIVE with accession MSV000088555. Supporting information, data and documentation are available at decode.slavovlab.net.Code availabilitySoftware, data-analysis pipelines and other supporting documentation are available at decode.slavovlab.net. The code for reproducing all the analyses and figures presented is freely available at GitHub (github.com/SlavovLab/decode).ReferencesCantwell-Dorris, E. R., O’Leary, J. J. & Sheils, O. M. BRAFV600E: implications for carcinogenesis and molecular therapy. Mol. Cancer Ther. 10, 385–394 (2011).Article CAS PubMed Google Scholar Hart, J. R. et al. The butterfly effect in cancer: a single base mutation can remodel the cell. Proc. Natl Acad. Sci. USA 112, 1131–1136 (2015).Article ADS CAS PubMed PubMed Central Google Scholar Wright, A. & Vissel, B. The essential role of AMPA receptor GluR2 subunit RNA editing in the normal and diseased brain. Front. Mol. Neurosci. 5, 34 (2012).Article CAS PubMed PubMed Central Google Scholar Parker, J. & Friesen, J. D. “Two out of three” codon reading leading to mistranslation in vivo. Mol. Gen. Genet. 177, 439–445 (1980).Article CAS PubMed Google Scholar Savitski, M. M., Nielsen, M. L. & Zubarev, R. A. ModifiComb, a new proteomic tool for mapping substoichiometric post-translational modifications, finding novel types of modifications, and fingerprinting complex protein mixtures. Mol. Cell. Proteomics 5, 935–948 (2006).Article CAS PubMed Google Scholar Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 (2008).Article CAS PubMed Google Scholar Wilhelm, M. et al. Deep learning boosts sensitivity of mass spectrometry-based immunopeptidomics. Nat. Commun. 12, 3346 (2021).Article ADS CAS PubMed PubMed Central Google Scholar Picciani, M. et al. Oktoberfest: open-source spectral library generation and rescoring pipeline based on Prosit. Proteomics 24, e2300112 (2024).Article PubMed Google Scholar Yang, K. L. et al. MSBooster: improving peptide identification rates using deep learning-based features. Nat. Commun. 14, 4539 (2023).Article ADS CAS PubMed PubMed Central Google Scholar Leduc, A. & Slavov, N. Impact of protein degradation and cell growth on mammalian proteomes. Preprint at bioRxiv https://doi.org/10.1101/2025.02.10.637566 (2025).Clark, D. J. et al. Integrated proteogenomic characterization of clear cell renal cell carcinoma. Cell 179, 964–983 (2019).Article CAS PubMed PubMed Central Google Scholar Krug, K. et al. Proteogenomic landscape of breast cancer tumorigenesis and targeted therapy. Cell 183, 1436–1456 (2020).Article CAS PubMed PubMed Central Google Scholar Gillette, M. A. et al. Proteogenomic characterization reveals therapeutic vulnerabilities in lung adenocarcinoma. Cell 182, 200–225 (2020).Article CAS PubMed PubMed Central Google Scholar Dou, Y. et al. Proteogenomic characterization of endometrial carcinoma. Cell 180, 729–748 (2020).Article CAS PubMed PubMed Central Google Scholar Cao, L. et al. Proteogenomic characterization of pancreatic ductal adenocarcinoma. Cell 184, 5031–5052 (2021).Article CAS PubMed PubMed Central Google Scholar Satpathy, S. et al. A proteogenomic portrait of lung squamous cell carcinoma. Cell 184, 4348–4371 (2021).Article CAS PubMed PubMed Central Google Scholar Wang, D. et al. A deep proteome and transcriptome abundance atlas of 29 healthy human tissues. Mol. Syst. Biol. 15, e8503 (2019).Article PubMed PubMed Central Google Scholar Batut, B. et al. Community-driven data analysis training for biology. Cell Syst. 6, 752–758 (2018).Article CAS PubMed PubMed Central Google Scholar Mordret, E. et al. Systematic detection of amino acid substitutions in proteomes reveals mechanistic basis of ribosome errors and selection for translation fidelity. Mol. Cell 75, 427–441 (2019).Article CAS PubMed Google Scholar Ma, C. et al. Improved peptide retention time prediction in liquid chromatography through deep learning. Anal. Chem. 90, 10881–10888 (2018).Article ADS CAS PubMed Google Scholar Wen, B., Wang, X. & Zhang, B. PepQuery enables fast, accurate, and convenient proteomic validation of novel genomic alterations. Genome Res. 29, 485–493 (2019).Article CAS PubMed PubMed Central Google Scholar Mohler, K. & Ibba, M. Translational fidelity and mistranslation in the cellular response to stress. Nat. Microbiol. 2, 17117 (2017).Article CAS PubMed PubMed Central Google Scholar Liigand, P., Kaupmees, K. & Kruve, A. Influence of the amino acid composition on the ionization efficiencies of small peptides. J. Mass Spectrom. 54, 481–487 (2019).Article ADS CAS PubMed Google Scholar Serrano, G., Guruceaga, E. & Segura, V. DeepMSPeptide: peptide detectability prediction using deep learning. Bioinformatics 36, 1279–1280 (2020).Article CAS PubMed Google Scholar Gessulat, S. et al. Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning. Nat. Methods 16, 509–518 (2019).Article CAS PubMed Google Scholar Gabriel, W. et al. Prosit-TMT: deep learning boosts identification of TMT-labeled peptides. Anal. Chem. 94, 7181–7190 (2022).Article ADS CAS PubMed Google Scholar Wiśniewski, J. R., Hein, M. Y., Cox, J. & Mann, M. A “proteomic ruler” for protein copy number and concentration estimation without spike-in standards. Mol. Cell. Proteomics 13, 1535–9484 (2014).Article Google Scholar Wu, Q. et al. Translation affects mRNA stability in a codon-dependent manner in human cells. eLife 8, e45396 (2019).Article PubMed PubMed Central Google Scholar Drummond, D. A. & Wilke, C. O. Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell 134, 341–352 (2008).Article CAS PubMed PubMed Central Google Scholar Quax, T. E., Claassens, N. J., Söll, D. & van der Oost, J. Codon bias as a means to fine-tune gene expression. Mol. Cell 59, 149–161 (2015).Article CAS PubMed PubMed Central Google Scholar McCormick, C. A. et al. mRNA psi profiling using nanopore DRS reveals cell type-specific pseudouridylation. Preprint at bioRxiv https://doi.org/10.1101/2024.05.08.593203 (2024).Mathieson, T. et al. Systematic analysis of protein turnover in primary cells. Nat. Commun. 9, 689 (2018).Article ADS PubMed PubMed Central Google Scholar Kong, A. T., Leprevost, F. V., Avtonomov, D. M., Mellacheruvu, D. & Nesvizhskii, A. I. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat. Methods 14, 513–520 (2017).Article CAS PubMed PubMed Central Google Scholar Chen, S. et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature 625, 92–100 (2024).Article ADS CAS PubMed Google Scholar Giansanti, P. et al. Mass spectrometry-based draft of the mouse proteome. Nat. Methods 19, 803–811 (2022).Article CAS PubMed PubMed Central Google Scholar Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).Article ADS CAS PubMed PubMed Central Google Scholar Specht, H. et al. PSMtags improve peptide sequencing and throughput in sensitive proteomics. Preprint at bioRxiv https://doi.org/10.1101/2025.05.22.655509 (2025).Slavov, N. Single-cell proteomic technologies: tools in the quest for principles. Annu. Rev. Biophys. 55, 253–275 (2026).Leduc, A., Khoury, L., Cantlon, J., Khan, S. & Slavov, N. Massively parallel sample preparation for multiplexed single-cell proteomics using nPOP. Nat. Protoc. 19, 3750–3776 (2024).Article CAS PubMed PubMed Central Google Scholar Huffman, R. G. et al. Prioritized mass spectrometry increases the depth, sensitivity and data completeness of single-cell proteomics. Nat. Methods 20, 714–722 (2023).Article CAS PubMed PubMed Central Google Scholar Sun, L. et al. Evolutionary gain of alanine mischarging to noncognate tRNAs with a G4: U69 base pair. J. Am. Chem. Soc. 138, 12948–12955 (2016).Article ADS CAS PubMed PubMed Central Google Scholar Netzer, N. et al. Innate immune and chemically triggered oxidative stress modifies translational fidelity. Nature 462, 522–526 (2009).Article ADS CAS PubMed PubMed Central Google Scholar Danecek, P. et al. Twelve years of SAMtools and BCFtools. GigaScience https://doi.org/10.1093/gigascience/giab008 (2021).Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).Article PubMed PubMed Central Google Scholar Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).Article CAS PubMed PubMed Central Google Scholar Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).Article CAS PubMed PubMed Central Google Scholar Pertea, G. & Pertea, M. GFF Utilities: GffRead and GffCompare. F10000Research 9, 304 (2020).Article Google Scholar Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. Preprint at https://doi.org/10.48550/arXiv.1207.3907 (2012).Wang, X. & Zhang, B. customProDB: an R package to generate customized protein databases from RNA-seq data for proteomics search. Bioinformatics 29, 3235–3237 (2013).Article CAS PubMed PubMed Central Google Scholar Lautenbacher, L. et al. Koina: Democratizing machine learning for proteomics research. Nat. Commun. 16, 9933 (2025).Huber, F. et al. matchms—processing and similarity evaluation of mass spectrometry data. J. Open Source Softw. 5, 2411 (2020).Article ADS Google Scholar Wan, K. X., Vidavsky, I. & Gross, M. L. Comparing similar spectra: From similarity index to spectral contrast angle. J. Am. Soc. Mass Spectrom. 13, 85–88 (2002).Halloran, J. T. & Rocke, D. M. Matter of time: faster percolator analysis via efficient SVM learning for large-scale proteomics. J. Proteome Res. 17, 1978–1982 (2018).Article CAS PubMed PubMed Central Google Scholar Huttlin, E. L. et al. The BioPlex network: a systematic exploration of the human interactome. Cell 162, 425–440 (2015).Article CAS PubMed PubMed Central Google Scholar Marino, A. et al. Aging and diet alter the protein ubiquitylation landscape in the mouse brain. Nat. Commun. 16, 5266 (2025).Article ADS CAS PubMed PubMed Central Google Scholar Li, J. et al. Proteome-wide mapping of short-lived proteins in human cells. Mol. Cell 81, 4722–4735 (2021).Article CAS PubMed PubMed Central Google Scholar Nettling, M. et al. DiffLogo: a comparative visualization of sequence motifs. BMC Bioinformatics 16, 387 (2015).Article PubMed PubMed Central Google Scholar Behle, A. et al. Manipulation of topoisomerase expression inhibits cell division but not growth and reveals a distinctive promoter structure in Synechocystis. Nucleic Acids Res. 50, 12790–12808 (2022).Article CAS PubMed PubMed Central Google Scholar Erdős, G., Pajkos, M. & Dosztányi, Z. IUPred3: prediction of protein disorder enhanced with unambiguous experimental annotation and visualization of evolutionary conservation. Nucleic Acids Res. 49, W297–W303 (2021).Article PubMed PubMed Central Google Scholar Eddy, S. Accelerated profile HMM searches. PLoS Comput. Biol. 7, e1002195 (2011).Article ADS MathSciNet CAS PubMed PubMed Central Google Scholar Moffat, L. & Jones, D. Increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework. Bioinformatics 37, 3744–3751 (2021).Article CAS PubMed PubMed Central Google Scholar Hu, G. et al. flDPnn: accurate intrinsic disorder prediction with putative propensities of disorder functions. Nat. Commun. 12, 4438 (2021).Article ADS CAS PubMed PubMed Central Google Scholar Peng, Z. & Kurgan, L. High-throughput prediction of RNA, DNA and protein binding regions mediated by intrinsic disorder. Nucleic Acids Res. 43, e121 (2015).Article PubMed PubMed Central Google Scholar Steinegger, M. & Soding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).Article CAS PubMed Google Scholar Zhao, B. et al. DescribePROT: database of amino acid-level protein structure and function predictions. Nucleic Acids Res. 49, D298–D308 (2021).Article ADS CAS PubMed PubMed Central Google Scholar Meng, E. et al. UCSF ChimeraX: Tools for structure building and analysis. Protein Sci. 32, e4792 (2023).Article CAS PubMed PubMed Central Google Scholar Chen, T. & Guestrin, C. XGBoost: A scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16)785–794 (ACM, 2016).Seplyarskiy, V. et al. A mutation rate model at the basepair resolution identifies the mutagenic effect of polymerase III transcription. Nat. Genet. 55, 2235–2242 (2023).Article CAS PubMed PubMed Central Google Scholar Cheng, J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492 (2023).Article CAS PubMed Google Scholar Download referencesAcknowledgementsWe thank M. Collins for help with analysis; N. Bandeira, A. Raj, Z. Ignatova and B. Karger for detailed feedback; O. Kok and S. Zheng for help with revisions; and members of the Slavov laboratory for discussions and suggestions.FundingThe work was funded by an Allen Distinguished Investigator award through The Paul G. Allen Frontiers Group to N.S., an NIGMS award (R01GM144967) to N.S., NCI awards UG3CA268117 and UH3CA268117 to N.S., an NIGMS award (R35GM148218) to N.S., an NIA award (R01AG092460) to N.S., and a Bits to Bytes award from MLSC to N.S.Author informationAuthors and AffiliationsDepartments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, Northeastern University, Boston, MA, USAShira Tsour, Rainer Machné, Andrew Leduc, Simon Widmer, Eunice Koo & Nikolai SlavovProgram in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USAJeremy Guez & Konrad J. KarczewskiAnalytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USAJeremy Guez & Konrad J. KarczewskiParallel Squared Technology Institute, Watertown, MA, USANikolai SlavovAuthorsShira TsourView author publicationsSearch author on:PubMed Google ScholarRainer MachnéView author publicationsSearch author on:PubMed Google ScholarAndrew LeducView author publicationsSearch author on:PubMed Google ScholarSimon WidmerView author publicationsSearch author on:PubMed Google ScholarEunice KooView author publicationsSearch author on:PubMed Google ScholarJeremy GuezView author publicationsSearch author on:PubMed Google ScholarKonrad J. KarczewskiView author publicationsSearch author on:PubMed Google ScholarNikolai SlavovView author publicationsSearch author on:PubMed Google ScholarContributionsStudy design, supervision and raising funding: N.S. Data analyses: S.T., R.M., A.L., S.W., E.K. and N.S. gnomAD analyses: J.G., S.T. and K.J.K. Initial draft: S.T. and N.S. Writing: all authors approved the final manuscript.Corresponding authorCorrespondence to Nikolai Slavov.Ethics declarationsCompeting interestsN.S. is a founding director and CEO of Parallel Squared Technology Institute, which is a non-profit research institute. S.T. is an employee of Alnylam Pharmaceuticals. All other authors declare no competing interests.Peer reviewPeer review informationNature thanks Yitzhak Pilpel, Mikhail Savitski and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.Additional informationPublisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Extended data figures and tablesExtended Data Fig. 1 Systematic identification and validation of amino acid substitutions.(a) Number of tumor and normal samples analyzed from each CPTAC dataset. (b) Number of samples analyzed for each healthy tissue from the label-free dataset. (c) Distribution of the percentage of each transcript with a read that is included in the patient-specific databases. (d) Distribution of the number of transcripts with 100% sequence coverage included in each patient-specific protein database. (e) Non-substitution modifications identified in the dependent peptide search are majorly comprised of post-translational modifications, and include artifacts and chemical derivatives from MS analysis. (f)–(k) (Continued on the next page) (f) The number of modified peptides identified as having an amino acid substitution or other type of post-translational or chemical modification. (g) Mass error distributions for SAAP and all peptides identified in the database search show no significant differences. The lower, middle, and upper lines of the boxplots correspond to the first quartile, median, and third quartiles, respectively. The upper whisker extends from the third quartile to the largest value and the lower whisker extends from the first quartile to the smallest value, each at most 1.5XIQR of the hinge. Data beyond the whiskers are outliers that are plotted as individual data points. (h) Butterfly plots showing a systematic mass shift in MS2 spectra between SAAP and BP for a representative SAAP with median RAAS = 1.2 in ribosome-binding protein 1 isoform 1 (RRBP1). The fragmentation spectra were predicted by the Prosit TMT model26. (i) Cumulative density distributions of p-values (MaxQuant) and FDR-controlled q-values computed using only SAAP. Red dashed line indicates confidence threshold for SAAP inclusion in further analysis. (j) Over 80% of substitutions identified from lysine (K) or arginine (R) are at sites of missed cleavage or are substitutions between K and R. (k) Observed and predicted (DeepRT+,20) retention times show strong agreement for all main peptides identified in standard database search and for SAAP. (l) TMT and label-free spectra for the same SAAP provide complementary evidence fragments and are in strong agreement with Prosit predictions25,26. (m) Observed spectra for SAAP quantified in both TMT and label-free datasets are in stronger agreement with the correctly matched prosit model, i.e. TMT spectra with Prosit TMT26 and label-free spectra with Prosit HCD25 than with the mismatched Prosit model.Extended Data Fig. 2 Establishing confidence in AAS abundance.(a) SAAP with high RAAS ≥ 1 are identified with the same FDR-controlled confidence as SAAP with low RAAS