Deciphering chromatin domain, domain community and chromunity for 3D genome maps with Mactop

Wait 5 sec.

IntroductionDecoding the three-dimensional (3D) structure of the genome is crucial for deciphering the fundamental principles that govern its functionality1,2,3,4. Advanced chromosome conformation capture techniques, including high-throughput chromosome conformation capture (Hi-C)5,6, SPRITE7, and Pore-C8, offer an extensive perspective on 3D genomic architecture by quantifying the interactions across chromosomal regions on a global scale. The acquisition of high-throughput chromatin data across a variety of biological contexts and processes has enhanced the understanding of the mechanisms governing DNA packaging within the nucleus, elucidated the dynamic nature of 3D conformational changes throughout developmental progression9, and illuminated the distinctions in cellular architecture between healthy and pathological states2,10.Investigations into these datasets have revealed a preferential pattern of interaction among chromosomal regions, leading to the emergence of high-order structural formations, such as chromosomal territories5,11, A/B compartments5,12, topologically associating domains (TADs)13,14, and chromatin loops12,15, distinguished by the magnitude of their structural units and the unique molecular characteristics of their composing regions. While the association between the formation of TADs and gene expression alterations remains contentious16,17,18, early studies suggest these units are evolutionarily conserved19 and play roles in both development20 and disease mechanisms21,22. Hence, precise delineation of TADs is essential to connect 3D genomic structure with cellular functionality.Various methods have been developed to identify TADs in recent years. For instance, one-dimensional linear metrics calculating statistical characteristics between chromatin fragments (bins) on the Hi-C contact map have been adopted to depict the TAD boundaries13,14,23. Clustering methods categorize chromatin fragments (bins) into clusters and designate the bins within the same cluster as a TAD24,25,26. Statistical model-based algorithms utilize probabilistic models with specific assumptions to identify TADs27,28,29. Hi-C maps can be modeled as graphs, and community detection or graph segmentation can be used to identify TADs30,31,32. However, these methods revealed significant inconsistencies in TAD identification33,34,35,36 and notable sensitivity to diverse factors, such as the resolution (i.e., size of the genomic region), sequencing depth, and the sparsity of the input data. Dang et al.37 emphasized that TADs and boundaries could be analyzed and classified according to their distinct characteristics with an ensemble strategy. Critically, current individual methods cannot classify TADs and boundaries based on spatially interacting characteristics, hindering the detailed investigation of their biological functions.To address this, we developed Mactop, a Markov clustering-based tool designed to identify topologically associating domains in high-throughput chromatin maps. Mactop can categorize TADs and boundaries with different types, providing further insights into their distinct biological functions. Mactop demonstrated superior performance to three competing methods, including the Directionality Index13, Insulation Score14, and Topdom23 in analyzing TAD patterns using silhouette coefficient38, computing stability across different resolutions and sampling depths of Hi-C data, and the enrichment of protein and histone modifications. Moreover, Mactop constructs an interaction map of TADs and demonstrates that TADs form communities with greater spatial proximity and are enriched with histone modifications related to active gene regulation. The chromatin within the community shows a higher level of openness. In contrast to metaTADs39, which are formed by selecting and merging the two most frequently interacting neighboring TADs, TADs within a community exhibit significant spatial interactions but are not necessarily adjacent in genomic position (Supplementary Fig. S1). In the high-order interaction data, Mactop can detect chromunities40 by constructing networks based on similarities between multi-way reads. Compared to sub-TADs41, which are more isolated within the TAD, chromunities exhibit interaction patterns across multiple regions within the TAD. Chromunities typically include a core self-interacting region within the TAD and interactions between this core region and other areas within the TAD. This unique pattern in high-order data highlights the intricate interactions between diverse regions within TADs and allows for a more comprehensive understanding of chromatin organization. In summary, Mactop is a versatile, accurate, robust tool for identifying 3D chromatin structures from diverse types of chromatin maps.ResultsOverview of MactopFor the initial chromatin interaction matrices, Mactop applies the normalization method42,43 to preprocess the data and help mitigate experimental errors (Fig. 1A). Mactop implements a block segmentation strategy along the main diagonal of the input matrix. Mactop performs a downsampling strategy on the submatrices and adds additional noise to measure the stability of TADs and boundaries. Mactop constructs a chromatin interaction graph for each resampled matrix, where the genomic bins serve as nodes, and the chromatin interactions between two bins form the edges. Mactop applies Markov clustering to the graph, where bins within the same cluster are considered to reside within the same TAD. The sampling and clustering process is repeated sufficiently, and the frequency of two bins appearing in the same cluster is tallied to construct a consistency matrix. Mactop calculates the consensus boundary score in the consistency matrix for each bin and applies a filtering algorithm to determine the final TAD boundary positions (Fig. 1B). Additionally, the types of boundaries can be categorized based on the consensus boundary score.Fig. 1: Illustration of Mactop.A Higher-order interaction reads data (top left) and paired reads data (top right). Higher-order interaction reads are decomposed into multiple paired reads through pairwise decomposition (bottom left) and then mapped to Hi-C interaction matrices (bottom right) based on a specified chromatin fragment length (resolution). B The mactop workflow for TAD identification in fixed-length chromatin segments. First, resample and add noise to the interaction matrix. Then, the interaction graph is constructed and clustered. From the clustering results, a consensus matrix is generated. TAD boundaries are then determined based on the consensus boundary score. C The left figure shows the heatmap of TADs identified based on the Hi-C interaction graph. The middle figure displays the heatmap of TAD communities identified from the TAD interaction graph. The right figure presents the heatmap of chromunities identified from the higher-order interaction graph.Full size imageMactop also constructs the interaction graph between TADs and identifies TAD communities with Markov clustering. Furthermore, Mactop could build a similarity graph of multi-way reads in high-order interaction data, which retains more detailed high-order interaction information relative to the Hi-C interaction matrix. Mactop further applies Markov clustering to this graph and identifies the chromatin spatial proximal structures called chromunities (Fig. 1C). In summary, Mactop identifies TADs, domain communities, and chromunities from diverse high-throughput chromosome conformation capture datasets. The recommended parameters for Mactop are detailed in the Methods section.Mactop could accurately and robustly identify TADs across Hi-C data with different resolutions and sequencing depthsWe applied Mactop and three competing methods, Insulation Score (IS), Directionality Index (DI), and TopDom, to the Hi-C data of five cell lines from Rao et al.12. The numbers of TADs identified across different chromosomes in the five cell lines show variations among the four methods. While Mactop and TopDom identified a broader range of TADs, suggesting their effectiveness in reflecting robust chromatin interactions, IS and DI indicate a more conservative detection (Fig. 2A, Supplementary Fig. S2A). The TAD boundaries identified by different methods showed relatively high consistency, with over 80% detected by at least two methods. Notably, about 95% of boundaries determined by DI were recognized by others. However, it has only 2,387 boundaries in total, significantly fewer than the other three methods, suggesting potential deficiency. In contrast, Mactop identified a higher number of boundaries and showed a lower proportion of boundaries with a low boundary score (the gray section), indicating a greater sensitivity to TAD boundary detection and an enhanced ability to uncover potential TAD boundaries (Fig. 2B).Fig. 2: Evaluation of TADs in terms of TAD number, internal validation metrics, and biological signals.A Number of TADs identified by each caller with a sample size of 23 representing the total number of chromosome. Outliers are indicated by black dots. B Boundary score of boundaries identified by each caller indicating only identified by itself (white) or identified by 1 (gray), 2 (orange), and 3 (red) other TAD callers. C Ratios of Directional Index (DI) and Insulation Score (IS) for each caller in GM12878, based on a sample size of 23 representing the total number of chromosomes. Higher scores indicate more pronounced interaction changes between upstream and downstream of boundary regions. Mactop differs significantly from TopDom and Insulation (p