Spatial recognition and semi-quantification of epigenetic events in pancreatic cancer subtypes with multiplexed molecular imaging and machine learning

Wait 5 sec.

IntroductionThe wide range of pancreatic cancer (PC) phenotypes causes difficulties in diagnosing, treating, and determining patient prognosis due to ambiguous, non-subtype-specific results. PC shows a variety of phenotypes regarding its histology (morphology), genetic alterations, and epigenetic modifications (EMs). The subtyping appreciates different aspects of the PC picture; consequently, molecular (transcriptomic), clinical, or histological subtypes were distinguished1. Studies have revealed differences in their biology, resulting in different prognoses and treatment efficiencies1,2,3,4,5,6,7.Although it might be difficult to correlate and link different systems of PC subtyping, some attempts have been made1,8. Nevertheless, important practical aspects of PC include its histological subtypes. The routine pathological practice, which establishes the details of PC diagnosis used in clinical management, exploits histopathological subtyping. This is partially caused by the cost-effectiveness of routine diagnostics, which currently does not allow for extended molecular testing of each patient. Antibody-based assays for histone and DNA modifications are being developed9; however, caution must be taken when introducing new techniques, and proper standardized procedures should be implemented before10. The transcriptomic subtypes distinguished by Collison et al.2, Moffit et al.3, and Bailey et al.4 are highly exploited in PC research1,11. Currently, however, they are not included in pathological diagnosis or clinical patient management procedures. Recognizing these subtypes is largely based on expensive RNA profiling, even though other approaches have been proposed11. Moreover, only a few molecular subtypes were distinguished, whereas morphological and biological PC presentations are much more common (over thirteen WHO and four non-WHO subtypes5,12). Experience learned from the classification of central nervous system tumors based solely on the profile of DNA methylation taught us the potential of EMs in practical tumor subtyping5,13. A multitude of these tumors were recognized and new entities were revealed that had not been identified before13.PC emerges in a well-documented fashion, originating as pancreatic acinar cells through acinar-to-ductal metaplasia or as pancreatic duct epithelial cells. The initial process of carcinogenesis, driven by genetic alterations, is followed by subsequent modifications (i.e., mutations in KRAS, TP53, CDKN2A, or SMAD4), depicted as dysplastic changes in flat lesions called pancreatic intraepithelial neoplasia (PanIN) or cystic lesions, such as the most prevailing intraductal papillary mucinous neoplasm (IPMN)5. However, genetics alone does not sufficiently reflect a variety of PC subtypes14.The emerging role of epigenetic alterations in PC has been recognized. Recently, we first linked some of the most common PC histological subtypes with variabilities in DNA methylation15. Moreover, recent studies on targeted therapy approaches have focused on this particular aspect of cancer progression16. Indeed, epigenetic alterations are excellent targets for PC therapeutics because they are reversible16. Notably, the change in the PC epigenome is very dynamic, not only in highly proliferating cancer cells but also as a reaction to the tumor environment and its host, showing a modus operandi that even justifies a travesty of a “mafia” within the “society of the body”17. Tumor-stroma crosstalk, another highly exploited subject of PC research, impacts the tumor epigenome18. This drives cancer spreading (meaning the ability to invade and metastasize), a logical consequence of the theory of evolution and its natural selection mechanisms19.Recent advances in cancer research have revealed three main dimensions of the transcriptomic machinery. First, the nucleotide sequence of the genome is prone to rearrangements, duplications, deletions, and other alterations20. In the past, it was thought that the DNA sequence solely determines basic transcription into proteins. However, at least two other underlying mechanisms alter the amino acid sequences in the resulting proteinogram. The local DNA three-dimensional (DNA conformation) structure directly affects the binding of different proteins involved in the DNA transcription process21,22. Finally, DNA and histone modifiers affect chromatin modeling, thus blocking or favoring the transcription of certain DNA sequences23.DNA appears in different secondary structures, known as conformations. The most prevalent is the right-handed double helix B-DNA, which builds up the DNA backbone24. Other double helix conformations (non-B) include right-handed A-DNA and left-handed Z-DNA. Additional DNA structures include DNA bubbles, cruciform, three-stranded H-DNA, four-stranded G-quadruplexes, and others24,25. In cancer research, Z-DNA has been exploited in the context of new therapeutic approaches26. Recently, the Z-form of DNA was shown to be involved in the ability of cancer cells to suppress apoptosis via the immune response, an escape mechanism of cancer cells from so-called immune checkpoint blockade (ICB) therapeutics27. Proteins containing Z-DNA-specific domains (called the Zα domain), such as adenosine deaminase acting on RNA 1 (ADAR1) or Z-DNA binding protein 1 (ZBP1)28,29,30,31, selectively bind to the Z-DNA form. Activation of ZBP1 leads to autoinflammation and necroptotic cell death, although it is negatively regulated by ADAR132,33,34. In cancer cells, ADAR1 is used to induce immune silencing and prevent termination, thus bypassing the therapeutic effect of ICB28. Recognition of the Z-DNA distribution in PCs might provide insight into cancer-induced ADAR1 activation.EMs are posttranslational processes that are crucially important in PC biology. The constant interplay between EMs leads to variable gene expression, which is dynamically regulated as a reaction to the environment or cellular requirements (i.e., highly proliferating cancer cells)16. EMs include DNA and histone modifications. DNA methylation (DNA-m) most commonly occurs in areas of cytosine-guanine dinucleotides (so-called CpG islands) and controls the transcription of specific genes, increasing or blocking it. On the other hand, histone modifications locally alter the state of chromatin, causing chromatin to transform into dense heterochromatin or loose euchromatin. The latter allows DNA transcription and gene expression. EMs are driven by enzymes that add or remove methyl or acetyl groups to CpG islands or histone amino acids. These proteins are grouped into so-called “writers”, which include DNA methyltransferases, histone lysine methyltransferases, and histone acetyltransferases. Another group called “erasers” contains DNA demethylases, histone lysine deacetylases (HDACs), histone lysine demethylases (KDMs), and protein arginine methyltransferases. Consequently, histone acetylation leads to transcription activation, whereas histone methylation depends on the site of the residue and the degree of modification16.Blocking some of the enzymes responsible for EMs in PC (such as specific HDACs or KDMs) are considered potential targets currently in clinical trials. This emerging role of epigenetic profiling in cancer therapy was excellently reviewed in14 and35. Another possible therapeutic approach involves blocking ADAR1 or activating ZBP1 to support the effects of ICBs and re-enable cancer cell apoptosis via the immune response26,28. Nevertheless, to successfully utilize these drugs in PC therapy, researchers must properly address PC heterogeneity and subtyping because of suspected variable responses. Indeed, most trials of PC therapy to date have yielded unsatisfactory results35.Here, we aimed to spatially recognize EMs and DNA conformations in PC subtypes. To do that, we employed an innovative approach relying on Raman hyperspectral mapping (RHM) combined with advanced machine learning techniques, including unsupervised autoencoders (AEs) and convolutional neural networks (CNNs). RHM is a method of unlabeled molecular imaging based on Raman spectroscopy that allows for the study of cancer tissues with submicrometric resolution, visualizing cellular components up to even individual chromosomes36,37. Adjacent pixels of spectral measurements are merged to form a so-called hyperspectral image. The information collected with RHM contains molecular interactions in the studied sample in an all-in-one manner, which authorizes the conclusion of the sample molecular structure and, thus, the contents of DNA or proteins, such as histones, and their secondary structures. Nevertheless, the need for high-quality RHM measurements and complex analysis of the resulting data, until recently, prevented the use of RHM in high-class cancer tissue exploration. Only a few reports are available describing DNA-m using Raman and infrared spectroscopy techniques38,39, and some attempts have been made to detect histone acetylation40,41. However, these studies primarily examined isolated DNA strands or individual cells rather than complex tissue samples. Recently, our group reported on DNA-m patterns among several PC histological subtypes obtained with RHM15.In this study, PC tissues of six histologically different subtypes were compared to benign control pancreatic ductal (CTRL) tissues. Specifically, we evaluated conventional ductal adenocarcinoma (cPDAC), adenocarcinoma derived from intraductal papillary mucinous neoplasm (IPMC), predominantly foamy-glands carcinoma (FG), predominantly large duct type carcinoma (PLD), and squamous differentiated adenocarcinoma (SD). These are the most common subtypes of ductal adenocarcinoma, which is the most common form of PC12. Furthermore, for comparison, we added the ampulla of Vater adenocarcinoma (AVAC), which is a cancer entity not classified as a form of PC according to the WHO; nevertheless, it develops in the main duct of the pancreatic opening into the duodenum (the ampulla of Vater). The diagnosis of AVAC vs. PC is frequently histopathologically challenging because of similarities in their morphologies and immunoreactivity15. Thus, practically, both PC and AVAC are considered when diagnosing a patient with a pancreatic tumor.Above all, we spatially show and semi-quantify the heterogeneity of EM in PC. In particular, we recognized and analyzed the global distribution of DNA-m, methylated lysine (Lys-m), acetylated lysine (Lys-a), and methylated arginine (Arg-m). Additionally, we complete these findings with the results of Z-DNA and B-DNA conformation distributions. Consequently, we identified the FG and SD PC subtypes as potentially less vulnerable to epigenetic regulator-targeting therapies.ResultsArtificial neural networks precisely identify cancer cell nucleiAfter including patients and selecting relevant tissue sections, followed by marking areas of interest (cancer and benign tissues), we obtained high-quality RHM maps of PC tissue samples. All-in-one hyperspectral data requires exact spectral classification to allow further analyses and interpretation. This process is of key significance for separating relevant information from other parts of tissue samples, such as tumor stroma, inflammation infiltrates, necrosis, benign tissues, or even cancer cell cytoplasm (in the context of exclusively studying cancerous nuclei). For a broader discussion on the importance of such specificity, we refer the reader to our previous papers36.For the task of spectral data classification, the most efficient and accurate methods are CNNs36. Here, we used CNNs to classify RHM spectra into cancerous nuclei, cytoplasm, and tumor stroma. The process of spectral preprocessing using AE and CNN training is described in the Methods section of the manuscript. The accuracy of successful classification (so-called validation accuracy) of the CNN training reached between 96 and 100%, and the results of that classification were plotted as CNN-prediction maps (Fig. 1). The subcellular components, such as the nuclei, are well-projected in CNN representations (Fig. 1B) based on hematoxylin & eosin-stained slides (Fig. 1A). Further sections of the manuscript describe the presented DNA-correlated map images (Fig. 1C), which reflect PC cells in more detail.Fig. 1Neural network-based classification of pancreatic cancer nuclei and histological slides of corresponding areas. Each section represents a hematoxylin-and-eosin (A - H&E) image of the PC area, the corresponding image generated by the CNN classification (B - CNN) into areas of cancer cell nuclei (green), cytoplasm (red), and other tissues (yellow), and a DNA correlation map (C – DNA). Arrowheads indicate cancer cell nuclei. The resemblance of H&E and DNA regarding subcellular component discrimination is noticeable (i.e., nuclear shape, size, crowding, or nucleoli details). Note that H&E staining is performed on a subsequent slide from the tissue block. (H&E, original magnification 400×)Full size imageMolecular imaging with reference DNA correlation reveals highly detailed subcellular structures of pancreatic cancer and benign control tissuesOnce prepared and classified, the RHM spectra could become the subject of proper analyses to reach the goals of our study. Here, we aimed to reveal the spatial distribution of EMs in PC subtypes. We searched for a way to extract specific information about DNA and histone modifications from the general data accessible in RHM spectra. We hypothesized that each spectrum of the RHM map could be correlated with a reference spectrum of certain EMs, resulting in a correlation score. We chose to study the most prevalent EMs in PC; consequently, Raman spectra of isolated DNA, lysine (Lys), and arginine (Arg) were measured and juxtaposed with measurements of methylated DNA, methylated Lys, methylated Arg, and acetylated Lys. Supplementary Figures S1A – S4A show the reference Raman spectra of these isolated DNA and amino acids, which were used for further analyses and comparisons.For the spectral correlation scoring, we chose the Pearson correlation coefficient (Pearson R), as it provided the best results in complex samples such as PC tissues (see the Discussion for details of this choice). The relative Pearson R correlation maps obtained by comparison of the reference DNA spectra revealed the ability to reflect the subcellular components of PC tissues with astonishing accuracy and were comparable to those of hematoxylin and eosin-stained slides (Fig. 1A vs C).The visual representation of the relative Pearson R between tissue samples and the reference DNA spectrum served as a proof-of-concept, proving that the method is suitable for spectral data comparison even in complex tissue samples. However, to extend this hypothesis in the context of EM imaging, which cannot be visually confirmed, as in the case of DNA, further validation was needed. We approached this threefold, as schematically presented in Fig. 2 and described in detail in the Methods section. Briefly, in addition to visual DNA confirmation, we identified differentiating spectral features characteristic of each of the EMs and confirmed them by (i) reference spectra matching, (ii) principal component analysis (PCA), and (iii) so-called volcano plot-based spectral peak analysis (VSPA) (see Fig. 2nd legend for the explanation).Fig. 2Validation of the spectral correlation scoring approach workflow (based on Lys-a). The correlation score was validated threefold. First, a correlation score-based heatmap with the reference DNA spectrum was generated, revealing a highly detailed view of subcellular components, including the nuclei (A). This step confirmed the spatial distribution of the DNA, which was excellently depicted with the corresponding hematoxylin-and-eosin (H&E) slide (C). Note that H&E staining is performed on a subsequent slide from the tissue block; thus, some areas are not precisely the same, although a clear resemblance of tumor morphology is observed (i.e., cellular nuclei size, shape, crowding, and chromatin condensation). Arrowheads indicate cancer cell nuclei. Further steps involved CNN classification into areas of cancer cell nuclei (B, green, arrowheads), cytoplasm (B, red), and other tissues (B, yellow), which allowed the selective extraction of nuclear spectra. From nuclei areas revealed by the CNN, we identified those that were “exclusive” for each epigenetic modification (D). Thus, for each modification (DNA-m, Lys-m, Lys-a, and Arg-m), a correlation score with a reference spectrum was measured, and a binary status of positively (1) or negatively (0) modified was applied, depending on that score (a positive 3rd quartile of correlation score distribution was used as the cutoff value). Consequently, the exclusive spectra for Lys-a comprised only those that were Lys-positive and Lys-a-positive but were negative for all other modifications. In circles (D), a high-resolution heatmap of a nucleus with superimposed exclusive spectra of Lys (D, round map, gray) and Lys-a (D, round map, green) is presented, further highlighting the spatial distribution of nuclear DNA and histones (Lys). In the next step (E), we extracted exclusive spectra for Lys and Lys-a for subsequent spectral feature comparison (F). The goal at this point was to identify characteristic features that differentiated our reference spectra of Lys and Lys-a (F, 1). These were then identified in the so-called loadings plot of principal component analysis (PCA) of exclusive Lys and exclusive Lys-a spectra (F, 2) and confirmed in another method of analysis based on a volcano plot, which combines the statistical significance of differentiating spectral features with a significant fold change in that differentiation (volcano plot-based spectral peak analysis – VSPA), revealing the most notable features (F, 3). The color arrowheads in (F) show the corresponding features characteristic of Lys-a identified in the reference spectra, PCA, and VSPA. A further description of our approach is presented in the Methods section. (C, H&E, original magnification 400x; F-1, reference Lys and Lys-a spectra comparison; F-2, PCA score counts plot, loading plot; F-3, VSPA volcano plot, significant peaks plot)Full size imageHigh-resolution multiplexed imaging of pancreatic cancer nuclei reveals variability in epigenetic modification levelsMapping based on relative correlation provided excellent large-scale images of PC tissues. After the identification of cancerous nuclei by the CNN classification, we aimed to capture the heterogeneity of nuclear morphology and EM occurrence. By choosing specific regions of the previously RHM-mapped tissues, we conducted subsequent molecular imaging at the highest resolution authorized by the diffraction limit (a measured pixel size of 250 nm, resulting in an actual resolution of approximately 500 nm given the Nyquist criterion). These new data were similarly correlated with the reference DNA spectrum, resulting in high-quality images of single-nucleus areas. The next step was to superimpose the highlighted areas of EMs and show multiplexed images that depict the variability of PC nuclei in terms of morphology and EM distributions. Figure 3 shows an example of the multiplexing of selected PC subtype nuclei.Fig. 3Multiplexed high-resolution insight into the heterogeneity of pancreatic cancer nuclei. High-power views of the PC nuclei with superimposed overlays of EMs show variability in their spatial distribution. In the zoomed-in views of selected nuclei from the IPMC (A), FG (B), PLD (C), and SD (D), each color dot represents the EM from transparent to oblique according to the level of that modification. The EM levels are determined in areas that are DNA, Lys, or Arg-positive for DNA-m, Lys-m/Lys-a, and Arg-m, respectively. The whole slide views (left side of the figure) and the background images in the highlighted EM views reflect a relative DNA correlation heatmap with a reference DNA spectrum.Full size imageSemi-quantified expression of epigenetic modifications significantly differs between pancreatic cancer subtypes and benign pancreatic duct tissuesAlthough the visualization of EM heterogeneity in PC subtypes revealed intriguing results, the semi-quantification of these relationships is of greater value in terms of unequivocal pathological interpretation and discussion. All spectra from the areas classified as nuclei by the CNN and positive for DNA, Lys, or Arg were correlated with reference EM spectra and analyzed. In Fig. 4A and D, the relative differences among PC subtypes and benign control pancreatic ductal tissues are presented for each of the EMs. It is important to note that the CTRL group spectra were extracted precisely from the pancreatic ductal tissues, omitting all other components, such as acini, pancreatic islets, or inflammation infiltrates, which could falsify the results. Almost all comparisons show significant differences, with only individual cases not statistically meaningful. Specifically, in DNA-m, no distinction is shown for CTRL vs. SD. In the Lys-m comparisons, only the CTRL vs. SD were similar, whereas in the Lys-a comparisons, the separations of the CTRL vs. PLD and FG vs. SD were not statistically meaningful. All groups revealed significant differences in the Arg-m correlation. Figure 4E and H present the distribution of these findings among the studied samples.Fig. 4Variable expression of DNA and histone epigenetic modifications among pancreatic cancer histological subtypes. The median levels (A-D) and distributions (E-H) of correlation-based EMs in nuclei identified by the CNN in all studied PC samples are presented as a ratio to the CTRL. The gray dashed lines at 1.0 show the CTRL level of a particular EMs. The AVAC and cPDAC PC subtypes were the most distinct from the CTRL at all EM levels, with AVAC presenting slightly less DNA-m than cPDAC, and oppositely at other modifications. The levels of EMs increase at IPMC and PLD, with the highest for the FG and SD. A particularly surprising finding is the hypermethylation (compared to CTRL) of DNA and Lys in FG and Arg in SD. (q, adjusted p-value of Dunn’s post hoc for the Kruskal‒Wallis test; n, number of spectra correlated)Full size imageThe ratio of left-handed Z-conformation of DNA varies among pancreatic cancer subtypesTo visualize and semi-quantify the distributions of DNA conformations, we utilized the same technique, which proved successful in the case of EMs. First, we measured the Raman spectra of isolated reference DNA in different conformations. The resulting reference spectra are shown in Supplementary Figure S6. Then, we looked for the same features in the RHM maps of the PC samples by correlating each of the map pixel spectra with the reference measurements. The score of that relative correlation was plotted, showing local variabilities in Z-DNA and B-DNA distributions (Fig. 5A and B). This visualization confirmed the predominant, more diffuse B-DNA localization in PC nuclei. On the other hand, Z-DNA is located in the nucleolus and peripheries of the nucleus. Next, we semi-quantified the relationships between DNA conformations in all studied groups of PC. The Z-DNA score, which is the ratio of Z-DNA relative to B-DNA, is presented in Fig. 5C. In that figure, PC subtypes show a grouping of AVAC with cPDAC, IPMC with PLD, and FG with SD. Additionally, the Z-DNA content of the last group was similar to that of the CTRL samples. Overall, it is interesting that lower EM levels are associated with lower Z-DNA levels. Again, the FG and SD groups clustered together with the CTRL group, as shown by the EM distribution results (Fig. 4).Fig. 5Distribution of left- and right-handed DNA conformations in pancreatic cancer nuclei. The distributions of the Z-DNA and B-DNA conformations are shown for selected PC subtypes, specifically the SD (A) and FG (B) subtypes. A more intense color correlates with a higher level of a specific conformation. B-DNA is the dominant conformation, whereas Z-DNA seems to favor the nucleoli and peripheral areas of the nuclei. Arrowheads mark the nucleoli. Semiquantitative analysis revealed significant differences among some PC subtypes, with FG and SD having the least amount of B-DNA and the most amount of Z-DNA, whereas cPDAC and AVAC presented opposite results. The Z-DNA score plot (C) was correlated with that of the CTRL samples (dashed gray line at 1.00). The score is the ratio of the Z-DNA expression level divided by the B-DNA expression level. (q, adjusted p-value of Dunn’s post hoc for the Kruskal‒Wallis test; n, number of spectra correlated)Full size imageGeneral analysis of spectral features confirms the distinct biology of some of the pancreatic cancer subtypesThe recognized pattern of distinct EM and DNA conformation levels in FG, SD, and less for PLD from those of AVAC, cPDAC, or IPMC presented in Figs. 4 and 5 was confirmed with a general spectral feature comparison. We used two methods widely used for feature-based grouping of multidimensional data, namely, PCA15,36 and t-distributed stochastic neighbor embedding (t-SNE)13. In both methods, a clear separation between PC groups is observed, as depicted and briefly described in Fig. 6. Further analysis and interpretation of the differences in spectral features are presented in Supplementary Figure S7.Fig. 6The grouping of pancreatic cancer subtypes based on their molecular interactions. The spectra obtained from DNA-positive areas of all studied histological subtypes of PC were separated by principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE). PCA scores plot (A) and score counts plot (B) allocate FG, SD, and CTRL to one extreme, similar to t-SNE, which also groups these types. On the other extreme of PCA are sequentially placed PLD, IPMC, cPDAC, and AVAC. In (B), the squares at the bottom mark the median values of scores along principal components for each PC group, depicting their clustering. The t-SNE PLD groups were separate from the IPMC, cPDAC, and AVAC groups; however, the t-SNE PLD group participated in one branch of the first group (FG and SD), showing the common features of FG, SD, and PLD. Both PCA and t-SNE highlighted significant common features of FG with SD and cPDAC with AVAC; however, both groups were located on opposite sides. Interestingly, this analysis, although performed on all nuclear spectra, showed the same tendency as the results of the semiquantification of EMs or DNA conformation (Figs. 4 and 5). Further interpretation of this separation is shown in Supplementary Figure S7.Full size imageDiscussionThe importance of tumor subtyping in pathological diagnosis is well-known42. Specific subtypes define the biology of tumors and determine clinical outcomes. Along with tumor grading, histological subtypes are assigned to diagnosed lesions to categorize neoplasms into classes of predictable behavior. The malignancy potential, thus its ability to spread through metastases, nodal or perineural involvement, and patterns of that spread, are described through histological subtyping7,12. Moreover, even though the information obtained with proper subtype assignment is substantial, it is the most convenient and cost-effective way for both the diagnostic pathologist and the clinical oncologist who recommends further management of the patient. The treatment of the disease is determined by its histological subtype and grade12,43. Moreover, the genetic testing to identify molecular targets follows the histological pre-diagnosis. Naturally, the recognition of epigenetic mechanisms in cancer biology should involve the same convenience. Here, we revealed the variable EMs and DNA conformations in PC histological subtypes and identified those that significantly differed from the others.In our study, we utilized a well-known correlation approach (the Pearson R), although it has not yet been applied for studying complex tissues. Along with other correlation-based techniques, it has proven efficient in Raman spectra matching44,45. Some authors have suggested that other methods of spectral matching are more accurate than Pearson R46. Thus, the Euclidean distance, cosine similarity, or squared first-difference cosine similarity were some of the proposed methods46. Nevertheless, after evaluating all of the mentioned techniques, we chose Pearson R as the best fit for spectral comparison in complex tissue samples (more details are provided in the Methods section). Notably, some aspects should be taken into account when considering the correlation of spectral data between isolated DNA or isolated amino acids and the correlation of spectra from complex PC tissues. It is to be expected that the latter should contain much more cumulative molecular data, at the same time joining intensities of multiple molecules and interfering with each other. Therefore, one should not expect the direct correlation to be high but rather pay attention to the relative relations, that is, the contrast between adjacent pixels in the correlation-plotted map image.In the context of PC tumorigenesis and progression, DNA-m was confirmed to play a crucial role. There are two distinct epigenetic trends regarding DNA-m in tumors, including PC. First, the methylation of cytosine in CpG islands of individual gene promoters causes the silencing of tumor suppressor genes47,48. The second mechanism involves global cytosine demethylation (hypomethylation). PC is globally hypo-DNA-methylated15,49,50, which results in chromosomal instability (CIN), rapid gene mutations, and cancer progression51,52. CIN is considered a driver of cancer metastasis and chemoresistance53.Furthermore, DNA-m interacts with other chromatin modifiers, such as HDACs or KDMs, to increase tumor suppressor gene silencing. It was established that increased expression of some HDACs in PC (i.e., HDAC1, HDAC2, HDAC3, HDAC4, or HDAC7) leads to deacetylation at Lys residues in PC histones54,55,56,57. Similarly, increased activity of some KDMs (i.e., KDM1A or KDM3A) causes histone Lys demethylation58,59. Moreover, the levels of histone trimethylation at Lys 27 (H3K27me3) were found to be generally decreased in PC60, whereas the total loss of H3K27me3 was associated with increased aggressive behavior of the tumor60. These findings were observed despite the overexpression of the histone-lysine methyltransferase EZH2 (enhancer of zeste homolog 2), which is the enzymatic driver of H3K27me350,61. Reports on histone methylation at Arg (a specific locus or global expression) in PC are very sparse62.Here, we report global hypomethylation of DNA and histones (at both Lys and Arg residues) and histone hypoacetylation (at Lys residues) in PC tissues compared to benign controls (Fig. 4). Most importantly, our results show that among its subtypes, PC is significantly heterogeneous in terms of EM expression. The presence of hypo-DNA-m in cPDAC, AVAC, and IPMC was recently confirmed by our group15. However, that work revealed only results limited to the methylation status of DNA, leaving the most important histone modifiers and the Z-DNA levels unrecognized. Moreover, the most important PC subtypes (i.e., FG, SD, and PLD), which we currently present, have not been investigated. In the present study, these subtypes exhibited EM patterns distinctly different from the current knowledge on PC (i.e., that PC is globally hypo-DNA-m). Specifically, the FG subtype was slightly hypermodified in all EM comparisons (q  50% of the tumor mass) foamy-glands/clear-cell patterns and large-duct/cystic-papillary patterns. The SD samples contained PCs with at least focal squamous differentiation with p40 and p63 immunostaining. IPMC samples contained PCs derived from IPMNs and which invaded mostly as cPDAC, with a single case being PLD. The study included patients who underwent pancreatoduodenectomy (Whipple or Traverso) or distal pancreatectomy for PC diagnosis, excluding those with benign pancreatic or neuroendocrine neoplasms. Tissue samples were obtained from the Pathology Department of Jagiellonian University Medical College and Cracow University Hospital’s archives and were typically stored as formalin-fixed paraffin-embedded (FFPE) blocks. The initial sample selection utilized standard hematoxylin-eosin-stained glass slides (H&E), which were confirmed by two experienced pancreatic pathologists using a routine light microscope. Each chosen case was sliced into a 2.5 μm thick tissue section using a Microm® HM355S Automatic Microtome and mounted onto a CaF2 window. Pathologists marked areas of interest on unstained CaF2-stained slides, including cancerous cells and stroma. The complete paraffin removal procedure involved a 12-hour xylene bath and graded ethanol rehydration.Obtaining optimal lys derivatives for reference spectraWe performed additional processing to optimize the quality of the Raman spectra of Lys, Lys-m, and Lys-a. The final compounds (those best rated by spectrum quality) of Lys, Lys-m, and Lys-a used for acquiring reference spectra were ʟ-lysine hydrochloride, free Nε-methyl-ʟ-lysine, and Nε-acetyl-ʟ-lysine dihydrochloride, respectively.Unless stated otherwise, all reagents used were of analytical grade and obtained from Sigma Aldrich. Ultrapure water with a resistivity of 18.2 mM was obtained using a Milli-Q water purification system (Merck Millipore).Synthesis of ʟ-Lysine hydrochloride (Lys)A total of 1.46 g (0.01 mol, 1 eq.) of ʟ-lysine was dissolved in 1.67 mL of 6 M HCl (0.01 M, 1 eq.), which was previously cooled to 15 °C in an ice bath. The mixture was then filtered through 0.2 μm Whatman Puradisc 13 syringe filters (Sigma Aldrich) and left to crystallize. The collected crystals were washed three times with 10 mL of anhydrous diethyl ether, left to dry overnight at room temperature, and then dried at 100 °C until no noticeable changes in sample mass were observed.Synthesis of Nε-acetyl-ʟ-lysine hydrochlorideFifty milligrams of Nε-acetyl-ʟ-lysine (0.267 mmol, 1 eq.) was dissolved in 0.265 mL of 0.001 M HCl (0.267 mmol, 1 eq.). The mixture was left to crystallize, and the collected crystals were washed three times with anhydrous diethyl ether and left to dry overnight at room temperature. Then, the sample was dried to a constant mass at 100 °C.Synthesis of Nε-acetyl-ʟ-lysine dihydrochloride (Lys-a)Nε-acetyl-ʟ-lysine dihydrochloride was obtained using the same procedure used for Nε-acetyl-ʟ-lysine hydrochloride, but 0.265 mL of 0.05 M HCl (0.534 mmol, 2 eq.) was used.Dehydrochlorination of Nε-methyl-ʟ-lysine hydrochloride (Lys-m)HCl was removed from Nε-methyl-ʟ-lysine hydrochloride using a modified procedure described by Blomquist et al.68. Briefly, 50 mg of Nε-methyl-ʟ-lysine hydrochloride (0.254 mmol, 1 eq.) was dissolved in 0.250 mL of water and cooled in an ice bath to 0 °C. Afterward, 0.250 (1.794 mmol, 7 eq.) mL of ice-cold triethylamine was added, and the obtained mixture was stirred in an ice bath for 30 min. Nε-methyl-ʟ-lysine was precipitated by adding 10 mL of ice-cold acetone. After centrifugation at 30 000 g, the crude product was washed three times with 2 mL of anhydrous chloroform, followed by washing twice with 1 mL of anhydrous diethyl ether. Then, the sample was allowed to dry at room temperature.Reference amino acid and DNA conformation measurementsRaman spectra of DNA (Jurkat Genomic DNA, Thermo Fisher Scientific, cat. no SD 1111), methylated DNA (CpG Methylated Jurkat Genomic DNA, Thermo Fisher Scientific, cat. no SD 1121), amino acids and their derivatives were obtained with a Horiba Lab RamHR spectrometer (Horiba Scientific) combined with an upright optical microscope BX41 (Olympus Corporation) equipped with an electron-multiplying charge-coupled device (EM-CCD). Spectra were collected with a green (532 nm) laser, 100× air objective (Olympus Corporation) and 600 mm− 1 diffraction grating providing 2 cm− 1 spectral resolution. The acquisition time was adjusted for each sample (typically between 10 and 20 s). Measurements were controlled via LabSpec6 software (Horiba Scientific).To obtain the Raman spectra of the Z-DNA, 10 µL of DNA solution was dissolved in 40 µL of 5 M NaCl (Sigma Aldrich). A 20 µL aliquot of the solution obtained as described above was cast on a microscope slide covered with tin foil. Then, to avoid sample drying, Raman spectra were collected immediately using the procedure described above.Raman measurements for RHM plottingRaman measurements of PC tissue slides were conducted using a Horiba LabRam (Horiba SAS, France) and an Alpha 300R (WITec, Ulm, Germany) confocal Raman microscope system, both of which were equipped with a green (532 nm) laser. The LabRam system was coupled with an electron-multiplying charge-coupled device (EM-CCD) camera cooled to -70 °C. Throughout the measurements, the tissue sections were submerged in a physiological saline solution, and a ×60 water immersion objective lens (Nikon) was utilized. Spectra were captured in the fingerprint spectral range (1900–600 cm− 1) with a spectral resolution of 2 cm− 1. The RHM technique involves multiple measurements of neighboring “pixels” of tissue, with the resulting spectra combined into a single map image. In this investigation, RHM maps comprised 10,000 to 18,000 spectra for a single slide, with an exposure time of 6 s for each pixel. The size of the pixels (step size) was 1 μm or smaller, depending on the dimensions of the area of interest, ranging from 80 μm × 80 μm to 140 μm × 140 μm.High-resolution measurements of selected PC nuclei areasConfocal Raman spectroscopy measurements were performed using an Alpha 300R confocal Raman microscope system (WITec, Ulm, Germany). The system was equipped with a 532 nm laser light source set to a laser power of 20 mW in front of the objective, a UHTS300 spectrometer with a 600 groove/mm grating and a thermoelectrically cooled (− 60 °C) back-illuminated CCD camera. The spectral data were acquired in an aqueous environment (0.9% NaCl solution) using a 63× water immersion objective NA 1.0 (W Plan-Apochromat, Zeiss, Germany) in the range 100–4100 cm-1. Raman maps of a 100 × 100 µm2 scan area (100 × 100 points) were acquired with a continuous laser beam with an accumulation time of 2 s per point. The resolution was improved to 88 × 88 points per 22 × 22 µm2 for measurements of the nuclei.Spectral preprocessingBefore the CNN can undergo training and be utilized for classification purposes, the RHM spectra must be preprocessed. This involved eliminating signal noise and cosmic ray spikes, as well as normalizing to account for differences in tissue sample thickness or measurement equipment. This preprocessing ensured that the spectral data were comparable, enabling substantive conclusions unaffected by experimental setup variations. In addition to standard preprocessing methods, such as baseline removal (3rd-order polynomial) and cosmic ray spike removal, we employed a novel approach using unsupervised machine learning, specifically AEs. Essentially, an AE comprises two deep artificial neural networks: an encoder, which compresses the data to highlight meaningful features, followed by a decoder, which reconstructs the original data while capturing the extracted features. This spectral data processing method aims to eliminate noise while retaining relevant information. AEs demonstrated high efficiency and accuracy in comparison to the standard approach, as detailed in Supplementary Figure S8 with further description. After denoising, all spectra were normalized in the 0 to 1 range.Autoencoder denoisingFor efficient spectral preprocessing and smoothing, we utilized a novel machine learning approach, AEs. We used a custom architecture comprising an encoder with 9 convolutional layers and a decoder with 9 convolutional layers; all the layers were finalized with a flat sigmoidal layer. Similar to the classification CNN, a “sequential” base model was employed, and a “glorot uniform” initialization mode was used for each layer. The “Adam” optimizer and “categorical cross-entropy” loss function were utilized. Training with a batch size of 105 was used, and 42 epochs (approx. 10 min) were needed. Details of the proposed AE architecture are summarized in Supplementary Figure S9.CNN architecture and trainingThe CNN was trained to predict three classes representing cellular nuclei, cytoplasm, and surrounding stroma. We employed a custom-designed CNN architecture, as detailed in15,36, comprising 10 convolutional layers for feature identification and 4 fully connected layers for classification. The CNN program was executed in Python (RRID: SCR_008394) version 3.10.5 using TensorFlow (RRID: SCR_016345) version 2.9.1 and Keras version 2.9.0. A “sequential” base model was employed, with each layer initialized using the “glorot uniform” mode. The “Adam” optimizer and “categorical cross-entropy” loss function were utilized. The training lasted from 10 to 38 epochs, depending on the RHM map, with a batch size of 105. The spectra for each class were combined and shuffled randomly, followed by “one-hot encoding” of the class arrays. The training and testing datasets were split at a ratio of 70/30. The approximate training time for the CNN was 15 min per RHM map. Further details regarding the CNN architecture are summarized in Supplementary Figure S10.CNN-based classification plottingThe CNN-based classification facilitated spectral classification and enabled the detailed identification of PC nuclei. Additionally, the CNN was utilized to generate prediction maps. Each spectrum obtained via RHM for every tissue sample was input into the CNN and categorized into one of three classes. The predicted class values (ranging from 0 to 2 representing stroma/empty, nucleus, and cytoplasm) formed an array, which, when combined with the x and y coordinates of the original RHM map, enabled the creation of a CNN classification map image. Each pixel in the image represents a CNN-predicted class, depicted with a distinct color. These maps were plotted in Python (RRID: SCR_008394) version 3.10.5 with MatPlotLib (RRID: SCR_008624) version 3.5.2.Determining the best correlation approachBefore choosing Pearson R for spectral comparison-based analyses, we studied approaches proposed by other authors46. Specifically, we evaluated the unit normalized Euclidean distance, squared cosine similarity, squared first-difference cosine similarity, Spearman correlation coefficient, and Pearson correlation coefficient. Then, the correlation spectra of the selected RHM maps were plotted, and the correlation scores were plotted as a heatmap. Visual inspection revealed the best approach for determining the subcellular components of PC tissues. The results of that evaluation are shown in Supplementary Figure S11. All comparisons were custom-coded in Python (RRID: SCR_008394) version 3.10.5.Correlation-based analyses and map plottingPearson R was selected as the most accurate for spectral matching in complex PC tissue samples. Each spectrum of the RHM map was correlated with the reference spectrum of the EM or DNA conformation, and a score of that correlation was saved with its spatial orientation. Then, all the scores were normalized (between 0 and 1), and heatmaps of the results were generated for subsequent analyses and comparisons. All correlation calculations were custom-coded in Python (RRID: SCR_008394) version 3.10.5.Volcano plot-based spectral peak analysis (VSPA)We utilized a custom approach based on volcano plots to identify spectral features specific to EMs. A volcano plot, when used to compare features, combines the minimal fold change and its statistical significance. We adjusted that idea to reveal statistically significant spectral peaks in the exclusive spectra of EMs: DNA-m, Lys-m/Lys-a, and Arg-m and base-DNA, Lys, and Arg, respectively. Each spectral intensity peak was compared, resulting in a fold change score and its p value (Mann‒Whitney U test). Next, a plot of significant peaks was made by translating volcano plot data to Raman spectrum data, with the log2-fold change value on the y-axis and the Raman shift on the x-axis. The statistically significant (p