EmbedTAD Using Graph Embedding and Unsupervised Learning to Identify TADs from High-Resolution Hi-C DataDownload PDF Download PDF ArticleOpen accessPublished: 09 December 2025H. M. A. Mohit Chowdhury ORCID: orcid.org/0009-0000-8687-40641,2 &Oluwatosin Oluwadare ORCID: orcid.org/0000-0002-5264-23421,2 Communications Biology , Article number: (2025) Cite this article We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.SubjectsData miningMachine learningAbstractTopologically Associating Domains (TADs) serve a functional purpose as self-interacting regions whose boundaries are enriched with various proteins. Identifying these TAD regions is essential for examining several biological characteristics, including immune system function and chromosome organization. In this study, we propose EmbedTAD for identifying TAD regions from high-resolution Hi-C data. To achieve this, we utilize NetMF, a graph embedding technique that employs low computational resources, and cluster the embeddings into TAD regions using the HDBSCAN algorithm. We demonstrate that, during T-cell differentiation, EmbedTAD detects TAD rearrangements and can differentiate between active and inactive cells. Furthermore, we show that EmbedTAD recovers a significant number of TADs also present in PLAC-seq data, demonstrating its reproducibility. We confirm that EmbedTAD detects TADs with distinct ChIP-seq signals surrounding their boundaries, including CTCF, RAD21, and SMC3. Overall, EmbedTAD reliably and efficiently identifies TADs with minimal computational resources, outperforming many state-of-the-art methods.Data availabilityIn-silico Hi-C was downloaded from HiCToolsCompare. Hi-C contact maps of human lymphoblastoid cell (GM12878) and mouse lymphoma cell (CH12LX) were downloaded from NCBI GEO GSE635253. ChIP-seq signal data were downloaded from https://www.encodeproject.org/, https://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/and https://hgdownload.soe.ucsc.edu/goldenPath/mm9/database/. mESC data downloaded from GSE351568. Mus musculus data were downloaded from GSE21041840. We used https://hicexplorer.readthedocs.io/en/latest/index.html, https://github.com/kmiles18/TAD-callers-comparison, https://github.com/vaquerizaslab/tadtool, and https://github.com/XiaoTaoWang/TADLib?tab=readme-ov-fileto plot some of our analysis results. The source data for experimental results and analysis data is found at https://github.com/OluwadareLab/EmbedTAD/tree/main/ra_data.Code availabilityThe EmbedTAD source code is freely available at https://github.com/OluwadareLab/EmbedTAD. The EmbedTAD documentation is available at: https://github.com/OluwadareLab/EmbedTAD/wiki.ReferencesKilpinen, H. & Dermitzakis, E. T. Genetic and epigenetic contribution to complex traits. Hum. Mol. Genet. 21, R24–R28 (2012).Google Scholar Sexton, T. et al. Three-dimensional folding and functional organization principles of the drosophila genome. Cell 148, 458–472 (2012).Google Scholar Rao, S. S. et al. A 3d map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).Google Scholar Pombo, A. & Dillon, N. Three-dimensional genome architecture: players and mechanisms. Nat. Rev. Mol. Cell Biol. 16, 245–257 (2015).Google Scholar Oluwadare, O., Highsmith, M. & Cheng, J. An overview of methods for reconstructing 3-d chromosome and genome structures from hi-c data. Biol. Proced. Online 21, 1–20 (2019).Google Scholar Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).Google Scholar Nora, E. P. et al. Spatial partitioning of the regulatory landscape of the x-inactivation centre. Nature 485, 381–385 (2012).Google Scholar Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012).Google Scholar Hong, S. & Kim, D. Computational characterization of chromatin domain boundary-associated genomic elements. Nucleic acids Res. 45, 10403–10414 (2017).Google Scholar Dekker, J. & Mirny, L. The 3d genome as moderator of chromosomal communication. Cell 164, 1110–1121 (2016).Google Scholar Gong, H. et al. Caspian: A method to identify chromatin topological associated domains based on spatial density cluster. Computational Struct. Biotechnol. J. 20, 4816–4824 (2022).Google Scholar Oluwadare, O. & Cheng, J. Clustertad: an unsupervised machine learning approach to detecting topologically associated domains of chromosomes from hi-c data. BMC Bioinforma. 18, 1–14 (2017).Google Scholar Haddad, N., Vaillant, C. & Jost, D. Ic-finder: inferring robustly the hierarchical organization of chromatin folding. Nucleic acids Res. 45, e81–e81 (2017).Google Scholar Lévy-Leduc, C., Delattre, M., Mary-Huard, T. & Robin, S. Two-dimensional segmentation for analyzing hi-c data. Bioinformatics 30, i386–i392 (2014).Google Scholar Shin, H. et al. Topdom: an efficient and deterministic method for identifying topological domains in genomes. Nucleic acids Res. 44, e70–e70 (2016).Google Scholar Chen, J., Hero III, A. O. & Rajapakse, I. Spectral identification of topological domains. Bioinformatics 32, 2151–2158 (2016).Google Scholar Zufferey, M., Tavernari, D., Oricchio, E. & Ciriello, G. Comparison of computational methods for the identification of topologically associating domains. Genome Biol. 19, 217 (2018).Google Scholar Sefer, E. A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinforma. 23, 127 (2022).Google Scholar Liu, K., Li, H.-D., Li, Y., Wang, J. & Wang, J. A comparison of topologically associating domain callers based on hi-c data. IEEE/ACM Trans. Computational Biol. Bioinforma. 20, 15–29 (2022).Google Scholar Xu, J. et al. A comprehensive benchmarking with interpretation and operational guidance for the hierarchy of topologically associating domains. Nat. Commun. 15, 4376 (2024).Google Scholar Filippova, D., Patro, R., Duggal, G. & Kingsford, C. Identification of alternative topological domains in chromatin. Algorithms Mol. Biol. 9, 1–11 (2014).Google Scholar Liu, E. et al. Identifying tad-like domains on single-cell hi-c data by graph embedding and changepoint detection. Bioinformatics 40, btae138 (2024).Google Scholar Lloyd, S. Least squares quantization in pcm. IEEE Trans. Inf. theory 28, 129–137 (1982).Google Scholar Campello, R. J., Moulavi, D. & Sander, J. Density-based clustering based on hierarchical density estimates. In Pacific-Asia conference on knowledge discovery and data mining, 160–172 (Springer, 2013).Hovenga, V., Kalita, J. & Oluwadare, O. Hic-gnn: A generalizable model for 3d chromosome reconstruction using graph convolutional neural networks. Computational Struct. Biotechnol. J. 21, 812–836 (2023).Google Scholar Wang, Y. & Cheng, J. Reconstructing 3d chromosome structures from single-cell hi-c data with so (3)-equivariant graph neural networks. NAR Genomics Bioinforma. 7, lqaf027 (2025).Google Scholar Qiu, J. et al. Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In Proceedings of the eleventh ACM international conference on web search and data mining, 459–467 (2018).Forcato, M. et al. Comparison of computational methods for hi-c data analysis. Nat. methods 14, 679–685 (2017).Google Scholar Pfitzner, D., Leibbrandt, R. & Powers, D. Characterization and evaluation of similarity measures for pairs of clusterings. Knowl. Inf. Syst. 19, 361–394 (2009).Google Scholar Li, X., Zeng, G., Li, A. & Zhang, Z. Detoki identifies and characterizes the dynamics of chromatin tad-like domains in a single cell. Genome Biol. 22, 217 (2021).Google Scholar Sanborn, A. L. et al. Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes. Proc. Natl Acad. Sci. USA 112, E6456–E6465 (2015).Google Scholar An, L. et al. Ontad: hierarchical domain structure reveals the divergence of activity among tads and boundaries. Genome Biol. 20, 1–16 (2019).Google Scholar Wang, X.-T., Cui, W. & Peng, C. Hitad: detecting the structural and functional hierarchies of topologically associating domains from chromatin interactions. Nucleic Acids Res. 45, e163–e163 (2017).Google Scholar Kruse, K., Hug, C. B., Hernández-Rodríguez, B. & Vaquerizas, J. M. Tadtool: visual parameter identification for tad-calling algorithms. Bioinformatics 32, 3190–3192 (2016).Google Scholar Wolff, J. et al. Galaxy hicexplorer 3: a web server for reproducible hi-c, capture hi-c and single-cell hi-c data analysis, quality control and visualization. Nucleic Acids Res. 48, W177–W184 (2020).Google Scholar Rosen, J. et al. Hptad: A computational method to identify topologically associating domains from hichip and plac-seq datasets. Comput. Struct. Biotechnol. J. 21, 931–939 (2023).Google Scholar Lee, D., Kang, J. & Kim, A. Tad-dependent sub-tad is required for enhancer–promoter interaction enabling the β-globin transcription. FASEB J. 38, e70181 (2024).Google Scholar Fang, R. et al. Mapping of long-range chromatin interactions by proximity ligation-assisted chip-seq. Cell Res. 26, 1345–1348 (2016).Google Scholar Zhu, J. & Paul, W. E. Cd4 t cells: fates, functions, and faults. Blood J. Am. Soc. Hematol. 112, 1557–1569 (2008).Google Scholar Zhang, G., Li, Y. & Wei, G. Multi-omic analysis reveals dynamic changes of three-dimensional chromatin architecture during t cell differentiation. Commun. Biol. 6, 773 (2023).Google Scholar Ester, M. et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd, vol. 96, 226–231 (1996).Perozzi, B., Al-Rfou, R. & Skiena, S. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, 701–710 (2014).Tang, J. et al. Line: Large-scale information network embedding. In Proceedings of the 24th international conference on world wide web, 1067–1077 (2015).Grover, A. & Leskovec, J. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 855–864 (2016).Rozemberczki, B., Kiss, O. & Sarkar, R. Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM ’20), 3125–3132 (ACM, 2020).Tang, J., Qu, M. & Mei, Q. Pte: Predictive text embedding through large-scale heterogeneous text networks. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, 1165–1174 (2015).Levy, O. & Goldberg, Y. Neural word embedding as implicit matrix factorization. Advances in neural information processing systems 27 (2014).Hinneburg, A. & Keim, D. A. A general approach to clustering in large databases with noise. Knowl. Inf. Syst. 5, 387–415 (2003).Google Scholar Ankerst, M., Breunig, M. M., Kriegel, H.-P. & Sander, J. Optics: Ordering points to identify the clustering structure. ACM Sigmod Rec. 28, 49–60 (1999).Google Scholar Download referencesAcknowledgementsThis work was supported by the National Institutes of General Medical Sciences of the National Institutes of Health under award number R35GM150402 to O.O.Author informationAuthors and AffiliationsDepartment of Computer Science and Engineering, University of North Texas, Denton, TX, USAH. M. A. Mohit Chowdhury & Oluwatosin OluwadareCenter for Computational Life Sciences, University of North Texas, Denton, TX, USAH. M. A. Mohit Chowdhury & Oluwatosin OluwadareAuthorsH. M. A. Mohit ChowdhuryView author publicationsSearch author on:PubMed Google ScholarOluwatosin OluwadareView author publicationsSearch author on:PubMed Google ScholarContributionsH.M.A.M.C. conducted the analysis, wrote, and revised the manuscript and O.O. conceived, wrote, revised the manuscript, and supervised this project. All authors reviewed the manuscript.Corresponding authorCorrespondence to Oluwatosin Oluwadare.Ethics declarationsCompeting interestsThe authors declare no competing interests.Peer reviewPeer review informationCommunications Biology thanks Emre Sefer and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: Leelavati Narlikar and Kaliya Georgieva. [A peer review file is available].Additional informationPublisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Supplementary informationTransparent Peer Review fileSupplementary MaterialDescription of Additional Supplementary FilesSupplementary Data 1Supplementary Data 2Supplementary Data 3Reporting SummaryRights and permissionsOpen Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.Reprints and permissionsAbout this articleDownload PDF