Decoding the regulatory grammar of human gene promoters

Wait 5 sec.

The recently developed promoter activity regulatory model (PARM) combines massively parallel reporter assays (MPRAs) with deep learning approaches to predict promoter activity across different human cell types.Understanding the regulation of transcription requires cell stage-, cell state-, and cell type-specific approaches; this has been a major challenge in genomics because diverse regulatory inputs converge to initiate transcription in the different cells that compose a complex mammalian body. Extensive research efforts have led to comprehensive mapping of transcription start sites (TSSs) and core promoter elements, enabling the development of massive promoter atlases and epigenome mapping projects and translating transcription factor (TF) binding logic into quantitative gene expression.1,2 However, there has been a major gap between the identification of regulatory elements and the functional prediction of how regulatory element binding at promoter regions determines transcriptional output. Barbadilla-Martínez and colleagues have provided compelling evidence for a predictable regulatory code that can be experimentally deciphered to understand, predict, and rewrite the regulatory code of gene expression.3Early efforts to annotate genomes were biased by a protein-centric definition of genes, but large-scale transcriptomic analyses have fundamentally changed this perspective. Technologies for systematically tagging the transcriptome, together with sequencing efforts directed at full-length DNAs, have revealed that the human genome is pervasively transcribed, producing numerous protein-coding mRNAs and an even larger number of non-coding RNA transcripts.4,5 This suggests that transcriptional regulation occurs within a dense regulatory landscape encompassing dense arrays of genes as well as distant enhancers. Genome-wide mapping of TSSs using technologies such as cap analysis of gene expression (CAGE) has shown that human gene expression relies on multiple promoters and initiation sites, with tissue- and condition-specific TSSs generating the remarkable diversity of transcripts observed.2 Promoters have emerged as dynamic regulatory platforms that act variably across multiple tissues, developmental stages, and environmental conditions, generating a wide range of transcriptional responses. Subsequent studies of mammalian core promoters have revealed that transcription initiation rarely conforms to the canonical TATA-box model; instead, the 3D architecture of promoter sites reflects a myriad of heterogeneous combinations of cis-regulatory elements and TSS distributions across the genome.6 This has proven to be a transformative insight in functional genomics, but it has also exposed a paradox: promoter mapping became possible whenever samples could be isolated in sufficient amounts, yet direct functional prediction from genomic sequences remained challenging.Chromatin profiling approaches that link gene expression outcomes to epigenomic profiling provide a multimodal view of transcription regulation, integrating features such as chromatin accessibility, histone modifications, and TF binding and occupancy. Deep learning models trained on such data can capture multiple correlations and achieve greater predictive power, but their ability to capture the regulatory grammar of transcription remains limited. It is evident that cell type- and condition-specific chromatin states are themselves largely reflective of a dynamic transcriptional landscape, making it difficult to define and distinguish sequence-based determinants from transcriptional consequences.Barbadilla-Martínez and colleagues have addressed this limitation by combining functional MPRAs with a convolutional neural network called PARM, training the model directly on experimentally measured promoter activities. Interestingly, promoter sequences are assayed outside their native chromosomal context, which enables transcriptional output to be causally attributed to the local genomic region. This framework shows that promoter activity can be predicted from local DNA sequences across multiple human cell types, demonstrating that regulatory information is embedded in promoter DNA sequences even outside the chromatin conformation context3 (Fig. 1).Fig. 1: Promoters act as regulatory hubs where enhancer inputs and transcription factor (TF) binding shape transcriptional output.Full size imageNeural networks can decode these outputs to design and predict regulatory outcomes. Illustration created using icons from https://BioRender.com.This finding is important for the interpretability of regulatory predictions in a mechanistic way. The “regulatory grammar” is evident from systematic perturbation analyses, which reveal that positional effects relative to TSSs strongly influence transcriptional output. Additionally, activating and repressing genomic elements exhibit reproducible spatial configurations in promoter occupancy, suggesting that underlying positional rules are in effect. This emerging viewpoint complements recent machine learning (ML) models such as Puffin, which showed that transcription initiation signals can be explained by a set of sequence motifs, initiators, and trinucleotide contexts across mammalian genomes, with specific position- and strand-specific logic.7 These studies seem to define transcription regulation as a combinatorial sequence syntax, rather than isolated motifs. Together, they demonstrate that promoters function as grammatical systems in which sequence-specific logic gives meaning to the arrangement and interactions of regulatory elements. Such organizational principles might also help explain how pervasive bidirectional transcription occurs, as well as the role of promoter-associated regulatory RNAs identified in earlier transcriptomic studies. During cell state transitions, promoter-proximal RNA polymerase II termination has been shown to act as a regulated checkpoint that kinetically contributes to both the activation and repression of transcription.8During cellular differentiation, transcriptional regulatory programs are distributed among promoter and enhancer regions to coordinate gene activation, blurring the distinction between functional regulation at promoter and enhancer sites.9 PARM caters to this functional redundancy by demonstrating stimulus- and cell type-specific promoter responses encoded directly in genomic sequences. Hence, the role of promoters as integrators of TF availability and cellular signaling inputs is clarified, with promoters acting as active, dynamic regulatory units. Although PARM models promoters independently of their native chromatin environment, the strong predictive power of promoter sequences suggests that intrinsic promoter architecture constitutes the primary layer of regulatory information upon which cell type-specific chromatin states may subsequently be established.One of the most striking implications of this study is the ability to design promoters in silico and achieve transcriptional activities comparable to those of endogenous promoters, as confirmed by experimental validation. This capability represents a turning point in functional genomics, which is now stepping into the realm of true DNA programmability. These and similar predictive promoter models could soon enable the rational design of transcriptional responses for regenerative medicine and gene therapy.A number of challenges remain. Promoter functions in the context of 3D genome organization, shaped by enhancer interactions, are not fully captured by the reporter-based assays used to train these models. Future research could rely on models trained on higher-order regulatory architectures and multimodal datasets, validated in the appropriate chromatin context. In addition, the activity of in silico-designed promoters should be further tested in more complex contexts, such as organoids.Nevertheless, the broader significance of this work lies in its placement within the field of functional genomics. Early transcriptomic studies using promoter-mapping techniques revealed the unexpected complexity of transcriptional regulation.6,10 Building on that foundation, ML approaches revealed the sequence determinants of transcriptional regulation. The current study makes further strides by demonstrating that promoter activity can be learned, predicted, and rationally engineered. We are gaining a new and much more refined understanding of the rules that govern the regulatory genome, moving toward an era in which the complex language of gene regulation may finally be understood.ReferencesCarninci, P. et al. Science 309, 1559–1563 (2005).Article  CAS  PubMed  Google Scholar Carninci, P. et al. Nat. Genet. 38, 626–635 (2006).Article  CAS  PubMed  Google Scholar Barbadilla-Martínez, L. et al. Nature 651, 1107–1116 (2026).Article  PubMed  PubMed Central  Google Scholar Carninci, P. Trends Genet. 22, 501–510 (2006).Article  CAS  PubMed  Google Scholar Carninci, P. DNA Res. 17, 51–59 (2010).Article  CAS  PubMed  PubMed Central  Google Scholar Sandelin, A. et al. Nat. Rev. Genet. 8, 424–436 (2007).Article  CAS  PubMed  Google Scholar Dudnyk, K., Cai, D., Shi, C., Xu, J. & Zhou, J. Science 384, eadj0116 (2024).Article  CAS  PubMed  PubMed Central  Google Scholar Lysakovskaia, K., Devadas, A., Schwalb, B., Lidschreiber, M. & Cramer, P. Nat. Struct. Mol. Biol. 32, 995–1005 (2025).Article  CAS  PubMed  PubMed Central  Google Scholar Paramo, M. I. et al. Nat. Commun. 17, 2177 (2026).Article  CAS  PubMed  PubMed Central  Google Scholar Abascal, F. et al. Nature 583, 699–710 (2020).Article  Google Scholar Download referencesAuthor informationAuthors and AffiliationsFred Hutchinson Cancer Center, Seattle, WA, USAZarnab AhmadHuman Technopole, Milan, ItalyPiero CarninciRIKEN Center for Integrative Medical Sciences, Yokohama, JapanPiero CarninciAuthorsZarnab AhmadView author publicationsSearch author on:PubMed Google ScholarPiero CarninciView author publicationsSearch author on:PubMed Google ScholarCorresponding authorsCorrespondence to Zarnab Ahmad or Piero Carninci.Ethics declarationsCompeting interestsThe authors declare no competing interests.Additional informationPublisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Rights and permissionsReprints and permissionsAbout this article