SpaIM: single-cell spatial transcriptomics imputation via style transfer

Wait 5 sec.

IntroductionRecent advances in spatial transcriptomics (ST) technologies have emerged to provide deep insights into spatial cellular ecosystems1,2,3. Sequencing-based ST technologies4,5,6,7, such as 10× Genomics Visium8 and Slide-seq9,10, utilize spatially indexed barcodes to conduct RNA sequencing on tissue spots. Meanwhile, imaging-based ST platforms like NanoString’s CosMxTM SMI11 and Vizgen’s MERSCOPE12 employ in situ hybridization and fluorescence microscopy to provide spatial transcriptomics data at the single-cell level. Despite these advancements, the gene expression profiles from these ST technologies exhibit data sparsity and limited gene coverage. For instance, NanoString’s CosMxTM SMI11 only assays thousands of genes, and the actual number of mRNA molecules detected per cell remains low, resulting in poor gene expression measurement due to limitations in molecular imaging and hybridization efficiency. This inherent technological constraint limits both the comprehensiveness of gene coverage and the density of count data, posing significant challenges. Addressing these limitations through computational methods is crucial to fully capturing and interpreting spatial transcriptomics profiles.Before the advent of spatial transcriptomics, single-cell RNA sequencing (scRNA-seq) technologies have gained attention for their ability to elucidate cellular heterogeneity13,14,15,16 and trace cell lineages17,18,19. Despite such insights, scRNA-seq lacks spatial information, making it challenging to determine the structural organization of cells within complex tissues. Nonetheless, as a complement to ST data, scRNA-seq has become invaluable for enhancing the quality of spatial transcriptomics, facilitating precise analyses of the transcriptome with spatial resolution in individual tissue sections. To improve spatial transcriptomics profiles, researchers have been actively developing methods20,21,22,23,24,25,26,27 to seamlessly integrate scRNA-seq with ST data. Notable methods include Tangram20, gimVI22, and spaGE21. Specifically, Tangram20 uses regularizers to filter an optimal subset of scRNA-seq profiles mapping with the spatial data. gimVI22 adopts a deep generative model to integrate scRNA-seq and ST data for the imputation of missing genes. SpaGE21 utilizes Principal Component Analysis to identify principal vectors and align cells from scRNA-seq and ST by k-nearest-neighbor. novoSpaRc25 leverages the continuity of gene expression among neighboring cells for predictions. Recent methods such as stDiff28 and SpatialScope29 use deep generative models to impute spatial gene expressions. TISSUE30 and SPRITE31 use an uncertainty-aware and meta-approach to achieve spatial gene expression predictions. Other methods like Seurat23, SpaOTsc24, LIGER26, and stPlus27 utilize different computational strategies to achieve local alignments between scRNA-seq and ST data, enabling the prediction of unmeasured gene expressions in ST data. However, these existing methods have inherent limitations as they primarily rely on local alignment to predict unmeasured gene expressions, which cannot fully unleash the potential of scRNA-seq and ST data for gene expression prediction.In this study, we introduce SpaIM, i.e., Spatial transcriptomics IMputation, a style transfer learning32 framework that leverages scRNA-seq data to impute unmeasured or missing gene expressions in ST data. Style transfer learning is a technique borrowed from computer vision32,33, and it allows SpaIM to apply patterns learned from scRNA-seq data to enhance the spatial transcriptomics profiles. SpaIM consists of an ST autoencoder and an ST generator, which work together to decouple scRNA-seq data and ST data into data-agnostic content and data-specific styles. The data-agnostic content captures the shared information between scRNA-seq and ST data, while the data-specific styles reflect the intrinsic differences between scRNA-seq and ST data. After training with a specifically designed loss function, the ST generator can independently predict unmeasured gene expressions in ST data using only scRNA-seq data, ensuring accurate imputation. SpaIM is available as open-source software on GitHub (https://github.com/QSong-github/SpaIM), with detailed tutorials demonstrating its capabilities in enhancing the utility of spatial transcriptomics profiles.ResultsOverview of the SpaIM modelTo accurately impute gene expressions, including unmeasured ones in spatial transcriptomics (ST) data, we introduce SpaIM, a style transfer learning model leveraging scRNA-seq (SC) data. As illustrated in Fig. 1a, SpaIM is a multilayer recursive style transfer (ReST) model with layer-wise content- and style-based feature extraction and fusion. Specifically, SpaIM comprises an ST autoencoder (Fig. 1b) and an ST generator (Fig. 1c) that are constructed with ReST. For a single gene, we consider the gene expression pattern across the $K$ single cell clusters as its content, and the unique gene expression pattern across all cells in ST data, which differs from SC data, as its style. The style represents the intrinsic differences in gene expression between the ST and the SC platforms. The style-transfer learning of SpaIM involves two simultaneous tasks: the ST autoencoder uses the SC data as the reference to disentangle the ST gene expression patterns into content and style, and the ST generator extracts the content from the SC data and transfers the learned style from the ST autoencoder to infer ST gene expressions. The ST autoencoder and the ST generator share the same decoder and are co-trained using a joint loss function based on the common genes between ST and SC data. This allows the ST generator to capture the gene expression patterns in the ST data as well as the relation between the ST and SC data. After training, the ST generator is used as a stand-alone model to infer the expression patterns of unmeasured genes in the ST data, using only the SC data as input. In this way, the well-trained SpaIM model enables accurate predictions of unmeasured gene expressions through leveraging the comprehensive gene expression profiles of scRNA-seq and the optimized ST generator.Fig. 1: Overview of the SpaIM model.SpaIM comprises an ST autoencoder and an ST generator. Both the ST autoencoder and the ST generator are built on the multilayer recursive style transfer (ReST) layers.Full size imageSpaIM accurately imputes spatial gene expression in human breast cancer tissue sliceTo evaluate the spatial gene imputation capabilities of our model, we conduct the performance comparisons using ST (10× Visium, ‘CID44971’) and scRNA-seq data (10× Chromium, GSE176078) from breast cancer tissues. Data source and details are listed in Supplementary Data 1. To comprehensively evaluate the performance of SpaIM, we compare it with 12 existing methods, including Tangram20, SPRITE31, stDiff28, SpatialScope29, TISSUE30, gimVI22, SpaGE21, Seurat23, SpaOTsc24, novoSpaRc25, LIGER26, and stPlus27. Evaluation metrics (see ‘Methods’) include the Pearson correlation coefficient (PCC), structural similarity index measure (SSIM), Jaccard similarity (JS), root mean square error (RMSE), and accuracy score (ACC). Higher values of PCC, SSIM, and ACC, as well as lower values of JS and RMSE, represent better performance.The comparison results are shown in Fig. 2, which highlights the superior performance of SpaIM over other methods across all metrics. Specifically, Fig. 2a reveals that SpaIM achieves the best values across all four metrics (PCC: 0.70 ± 0.02), outperforming the second-best model, Tangram (PCC: 0.62 ± 0.02). Of note, the SSIM value of SpaIM (SSIM: 0.60 ± 0.02) is 10% higher than that of Tangram (SSIM: 0.52 ± 0.02, Fig. 2b), demonstrating that the imputed gene expressions of SpaIM are much closer to the ground truth. Furthermore, Supplementary Fig. 1a illustrates the other metrics, with SpaIM achieving a significantly better performance (JS: 0.11 ± 0.01, RMSE: 0.81 ± 0.01) than other methods. This underscores the exceptional accuracy of SpaIM in spatial gene expression imputation.Fig. 2: Performance evaluation in the breast cancer dataset.a Comparison results between SpaIM and existing methods using the Pearson correlation coefficient (PCC). Data are presented as mean values ± 95% confidence intervals across predicted genes ($n$ = 991). b Comparison results between SpaIM and existing methods using structural similarity index measure (SSIM) across predicted genes ($n$ = 991). c Spatial visualization of the ground truth and the predicted gene expressions. d Spatial visualization of spatial domains. e Scatter plot of associated ligand‒receptor pairs in the raw data and the SpaIM-imputed data.Full size imageIn addition to the comprehensive evaluations of SpaIM, we further examine the gene expression predictions generated by different methods. For this breast cancer ST data, Fig. 2c presents the predicted gene expressions of maker genes for the invasive cancer region (ERBB2, KRT8, Fig. 2d), with PCC values indicating their correlations with ground truth. SpaIM consistently outperforms other methods, surpassing both Tangram and stDiff. Moreover, SpaIM accurately predicts marker genes in the normal gland region, such as HLA-DRA and CD52 (Fig. 2d), achieving PCC of 0.63 and 0.62, respectively. In contrast, Tangram produces misleading predictions for CD52, with a notably low correlation. Next, to validate the utility of SpaIM-imputed data for downstream analysis, we curate a combined list of candidate ligand‒receptor (L–R) pairs34. Among the collected L–R pairs, SpaIM-imputed data identified 33 strongly associated pairs, compared to only 11 pairs detected in the raw data, with 10 pairs overlapping (Fig. 2e), suggesting the capability of SpaIM in revealing biological insights. Top associated pairs such as VEGFA-ITGB135 and LTF-TFRC36,37, well known for their roles in tumor activities, are exclusively detected in the SpaIM-imputed data. Collectively, these results underscore the reliability and superiority of SpaIM in gene expression imputation.SpaIM enhances the detection of differentially expressed genesAs an emerging imaging-based ST technology, NanoString CosMx™ enables the detection of up to a thousand genes per slide38 at subcellular resolution, emphasizing the necessity of using SpaIM to expand gene coverage. Here we collect CosMx ST datasets from lung cancer tissues (Supplementary Data 1), with up to 70k to 130k cells per slide, to evaluate the performance of SpaIM.Taking the Lung9–rep1 dataset as an example, we assess the gene expression imputation efficacy of different methods. Figure 3a clearly shows that SpaIM, with median SSIM and JS values of 0.21 and 0.43, outperforms other methods such as Tangram (SSIM: 0.15, JS: 0.47), stDiff (SSIM: 0.12, JS: 0.50), SpatialScope (SSIM: 0.11, JS: 0.50), gimVI (SSIM: 0.19, JS: 0.66), SpaGE (SSIM: 0.10, JS: 0.62), and Seurat (SSIM: 0.17, JS: 0.82). Moreover, SpaIM consistently achieves superior performance across additional metrics including PCC and RMSE, than the other methods such as gimVI and Tangram (Supplementary Fig. 1b). Notably, in terms of accuracy score (ACC), SpaIM shows significantly higher ACC (ACC: 0.96 ± 0.05), compared to other methods including Tangram (ACC: 0.91 ± 0.11), stDiff (ACC: 0.59 ± 0.09), SpatialScope (ACC: 0.55 ± 0.17), gimVI (ACC: 0.82 ± 0.27), SpaGE (ACC: 0.64 ± 0.14), and Seurat (ACC: 0.36 ± 0.14).Fig. 3: Benchmarking performance on the NanoString CosMx SMI dataset.a Benchmarking results on the Lung9–rep1 dataset, using evaluation metrics including structural similarity index measure (SSIM) and Jaccard similarity (JS). Data are presented as mean values ± 95% confidence intervals across predicted genes ($n$ = 2038). b Spatial visualization of cell types in the Lung9-rep1 dataset. c Comparisons of the number of differentially expressed genes in each cell type. d Spatial visualization of ground truth and the predicted expressions of tumor-related genes.Full size imageWith different cell types in the spatial microenvironment (Fig. 3b), we then identify the differentially expressed genes (DEGs) for each cell type using raw data and imputed data from the top-performing methods, i.e., SpaIM, Tangram, and stDiff (Fig. 3c). As expected, compared to the 59 lymphocyte-specific DEGs detected in the raw data with an adjusted P-value of 0.05 and a log2 fold change of 1, the number of significant genes increases to 92 when using SpaIM imputations. In contrast, Tangram and stDiff imputations identify 40 and 22 DEGs, respectively, at the same thresholds. Moreover, to visually illustrate the predicted spatial gene patterns by different methods, we specifically select biologically significant DEGs (KRT5, CD63, MMP9, MALAT1, FOXP3) as examples to evaluate whether SpaIM can accurately recover their profiles if they are masked (i.e., considered as unmeasured genes). As is known, those genes play pivotal roles in cancer, participating in various signaling pathways that regulate tumor biological behaviors, including epithelial-mesenchymal transition39, immune evasion40, tumor progression41, and metastasis42,43. The raw and predicted gene expressions are visualized with PCC values labeled (Fig. 3d). For intuitive comparison, we include the ground truth gene expressions in the top row. Remarkably, the spatial gene patterns predicted by SpaIM exhibit higher similarity to the ground truth, as evidenced by a superior PCC and a closely matched threshold range, underscoring SpaIM’s robust capacity for imputing biologically significant genes.SpaIM facilities spatial domain detection and recovers unmeasured genesIn addition, we evaluate the imputation performance on another single-cell spatial data (Lung5–rep3). Specifically, Fig. 4a shows that SpaIM achieves a higher median SSIM of 0.15 in the Lung5–rep3 dataset, outperforming other methods, including Tangram (median SSIM: 0.14), stDiff (median SSIM: 0.13), and SPRITE (median SSIM: 0.12). SpaIM also exhibits superior performance (JS: 0.36), lower than Tangram (JS: 0.49) by approximately 13% and stDiff (JS: 52) by about 16%. Other metrics, including PCC, RMSE, and accuracy (Supplementary Fig. 1c), further demonstrate the outperformance of SpaIM, establishing it as the most effective method for imputing gene expressions.Fig. 4: SpaIM facilities spatial domain detection.a Benchmarking results on the NanoString CosMx spatial transcriptomics dataset (Lung5–rep3), using evaluation metrics including structural similarity index measure (SSIM) and Jaccard similarity (JS). Data are presented as mean values ± 95% confidence intervals across predicted genes ($n$ = 2,038). b Spatial visualization of cell types in the whole slide. c Spatial visualization of cell types in specific field of views (FOVs).Full size imageWe next evaluate the accuracy of spatial domain detection based on imputed gene expression data generated by different methods. Figure 4b shows the adjusted rand index (ARI) scores of all 20 FOVs in the Lung5–rep3 dataset. Notably, SpaIM achieves the highest accuracy in identifying spatial domains corresponding to different cell types (ARI = 0.50), closely approximating the ground truth and raw data (ARI = 0.56). In contrast, Tangram and gimVI produced significantly lower ARI scores of 0.16 and 0.25, respectively. Further analysis at the individual FOV level (Fig. 4c) demonstrates that SpaIM consistently aligns with the ground truth in distinguishing cellular identities. SpaIM accurately reveals a continuous tumor region infiltrated with dispersed immune cells. Conversely, gimVI and Tangram exhibit errors in cellular structure identification, often generating fragmented or blended cell type regions. For example, in FOV-1, both gimVI and Tangram misclassify tumor cells as epithelial cells. In FOV-2, lymphocytes are incorrectly identified as neutrophils, and in FOV-3, gimVI fails to differentiate lymphocytes, misclassifying them as fibroblasts. These errors result in poor delineation of spatial heterogeneity. These findings highlight SpaIM’s superior performance in leveraging imputed gene expression to achieve precise spatial domain identification compared to alternative methods.SpaIM demonstrates remarkable ability in imputing unmeasured spatial gene expression data, providing biologically meaningful insights. As illustrated in the UMAP plots (Fig. 5a), SpaIM effectively imputes cell-type-specific gene markers, maintaining consistent expression patterns within their respective cell types (Fig. 5b). For instance, the endothelial marker gene MME44 exhibits uniform high expression in endothelial cells and low expression in other cell types when imputed by SpaIM. In contrast, Tangram’s imputed data shows sporadic MME expression in endothelial cells and pervasive expression in non-endothelial cell types. Other SpaIM-imputed cell type-specific markers, CD1C45,46 and DAG147 are enriched in lymphocyte- and myeloid-dense regions, respectively, while the tumor-specific gene MYCN48 is strongly and uniformly expressed in tumor regions. These distinct patterns are not observed in Tangram’s imputed data (Fig. 5c). These results highlight SpaIM’s reliability in imputing unmeasured gene expression, enabling more accurate spatial cellular characterization. Based on SpaIM’s imputed data, further biological insights were obtained into the spatial distribution of immune cell infiltration and the pseudotemporal trajectories of tumor cells across regions with varying levels of immune infiltration (Supplementary Note 1; Supplementary Figs. 2 and 3). Downstream analyses using imputed data from different methods further support the biological reliability of SpaIM in uncovering complex cellular behaviors within the tissue microenvironment (Supplementary Note 2; Supplementary Figs. 4–7).Fig. 5: SpaIM reliably recovers unmeasured genes.a UMAP visualizations of all cell populations and nontumor populations. b UMAP visualization of the imputation of unmeasured marker genes by SpaIM. c UMAP visualization of the imputation of unmeasured marker genes by Tangram.Full size imageSpaIM accurately imputes spatial gene expression across diverse ST platformsTo further evaluate SpaIM’s performance, we conduct comprehensive experiments across multiple spatial transcriptomics (ST) datasets, profiling both sequencing-based and imaging-based platforms. Specifically, we collect 21 Visium datasets encompassing a range of tissues, including breast, prostate, kidney, and brain, from humans, mice, and zebrafish (Supplementary Data 1). Considering the diverse data quality and characteristics across datasets, we employ ranked performance for a fair comparison across different methods49. The benchmarking results of different methods on 10× Visium datasets are shown in Fig. 6a and Supplementary Fig. 8a. Figure 6a highlights that SpaIM consistently achieved the highest performance across these datasets, outperforming the second-best model by more than 13% in ACC.Fig. 6: Gene imputation performance across diverse spatial transcriptomics (ST) datasets.a Comparison results across 21 Visium ST datasets. Box plots: center line = median; box limits = upper (75th) and lower (25th) quartiles; whiskers = 1.5× interquartile range; outliers are points beyond whiskers. b 28 sequencing-based datasets. c 25 imaging-based ST datasets. d Total 53 datasets profiled by diverse ST technologies, using ranked Pearson correlation coefficient (PCC), Jaccard similarity (JS), and accuracy score.Full size imageMoreover, we expand the evaluations to include other sequencing-based platforms such as Slide-seq, Slide-seq V2, Seq-scope, and HDST. This expanded sequencing-based ST data collection includes a total of 28 datasets, with their paired scRNA-seq data from the same tissue type (Supplementary Data 1). The benchmarking results on these datasets are illustrated in Fig. 6b and Supplementary Fig. 8b. Notably, SpaIM continues to exhibit superior performance across all sequencing-based datasets than other methods. Specifically, SpaIM has higher PCC (1.0 ± 0.0), JS (0.96 ± 0.16), and ACC (ACC: 0.97 ± 0.06) than other methods such as Tangram (PCC: 0.92 ± 0.02, JS: 0.87 ± 0.05, ACC: 0.87 ± 0.04), gimVI (PCC: 0.51 ± 0.13, JS: 0.60 ± 0.17, ACC: 0.57 ± 0.15), and stDiff (PCC: 0.77 ± 0.06, JS: 0.65 ± 0.07, ACC: 0.77 ± 0.05).We also showcase the exceptional gene imputation performance of SpaIM across 25 imaging-based ST datasets, including platforms such as seqFISH, seqFISH+, MERFISH, STARmap, ISS, and osmFISH (Supplementary Data 1). These datasets typically exhibit limited gene coverage and signals. Based on these datasets, we conduct experiments using SpaIM and other methods, with results showing in Fig. 6c and Supplementary Fig. 8c. SpaIM achieves more accurate predictions with higher PCC (0.97 ± 0.11), JS (0.95 ± 0.12), and ACC (ACC: 0.92 ± 0.08) than other methods such as Tangram (PCC: 0.86 ± 0.10, JS: 0.77 ± 0.19, ACC: 0.78 ± 0.13), gimVI (PCC: 0.47 ± 0.18, JS: 0.48 ± 0.18, ACC: 0.50 ± 0.17), and stDiff (PCC: 0.67 ± 0.16, JS: 0.59 ± 0.13, ACC: 0.68 ± 0.14).A comprehensive evaluation across 53 datasets, spanning both sequencing- and imaging-based platforms, highlights SpaIM’s exceptional performance. These datasets, encompassing diverse data quality and characteristics, provide a robust benchmark for comparison. As illustrated in Fig. 6d and Supplementary Fig. 8d, SpaIM consistently outperforms other methods, achieving an average ACC of 0.95 ± 0.07. In contrast, Tangram (ACC: 0.81 ± 0.10), gimVI (ACC: 0.50 ± 0.15), and stDiff (ACC: 0.71 ± 0.10) demonstrate comparatively lower performance. This extensive assessment underscores SpaIM’s reliability and robustness, establishing it as a state-of-the-art tool for spatial transcriptomics across diverse datasets.DiscussionIn this paper, we introduce SpaIM, a style transfer learning model designed to impute unmeasured spatial gene expressions. SpaIM adopts a strategy that reconceptualizes spatial gene expressions into dataset-agnostic gene content and dataset-specific styles. The SpaIM model comprises the recursive style transfer (ReST)-based ST autoencoder and the ReST-based ST generator. Central to SpaIM is its dual-task style-transfer learning approach: the ST autoencoder disentangles ST gene expression patterns into content and style using scRNA-seq as a reference, while the ST generator transfers learned style to infer gene expressions in ST data from scRNA-seq inputs. Both components share a common decoder and are jointly trained with a loss function based on overlapping genes between ST and scRNA-seq data, enhancing the model’s ability to capture and interpolate gene expression relationships. SpaIM effectively integrates spatial data styles with the content from scRNA-seq data to accurately recover masked gene expressions, demonstrating its capability in imputing unmeasured genes in ST data, particularly in imaging-based ST that typically lack comprehensive measurements.Compared to existing methods, SpaIM offers superior gene imputation performance and distinguishes itself from others. We reason that other methods generally try to locally align scRNA-seq with ST for spatial gene imputation, while overlooking the stylistic differences between the two types of data. SpaIM innovatively redefines gene imputation by decoupling both scRNA-seq and ST datasets into dataset-agnostic content and dataset-specific styles. This strategic decoupling allows SpaIM to better recognize and utilize both the commonalities and unique characteristics between scRNA-seq and ST data. Such clarity in data handling not only leads to more precise gene imputation but also improves the model’s ability to adapt to various ST data characteristics, thereby boosting SpaIM’s generalization capabilities and applicability across different from various platforms (i.e., sequencing-based and imaging-based). Such clarity of the SpaIM design provides better interpretability in its accurate predictions of incomplete gene expression data, positioning SpaIM as a leading method in the field. More importantly, SpaIM greatly enhances downstream analyses in spatial transcriptomics data, opening avenues for biological discovery. By accurately imputing missing gene expressions, SpaIM enables the identification of key ligand-receptor pairs and enhances differential expression analyses, allowing for precise identification of spatial cell types and the discovery of genes that vary across them. This powerful tool expands the scope of spatial data analysis and enhances biological insight derived from spatial transcriptomics.In addition to the benchmarking methods included in our study, we also identified another diffusion-based model, SpaDiT50, which integrates scRNA-seq as prior knowledge and leverages shared gene information between single-cell and spatial datasets to guide spatial gene expression prediction through a diffusion transformer architecture. It has demonstrated promising results across ten spatial transcriptomics datasets. While SpaDiT contributes meaningfully, its benchmarking was conducted on a limited number of datasets compared to our comprehensive evaluation across 53 spatial datasets. Moreover, it is worth noting that in certain cases, SpaDiT underperforms relative to established methods, suggesting that its performance advantage is not universally consistent. Therefore, while SpaDiT is a valuable addition to this growing field, we do not anticipate it will consistently outperform the benchmarking methods included in this study.While both scRNA-seq and spatial transcriptomics (ST) data exhibit high dropout rates, the dropout patterns differ significantly between them, with ST showing notably higher dropout for many genes (Supplementary Fig. 9a). These findings highlight the complementary nature of the two modalities and support the use of scRNA-seq as a reference to improve spatial gene expression imputation in ST data (Supplementary Note 3). Moreover, we used Shapley values to interpret the content-related features and style-related features disentangled by SpaIM (Supplementary Note 4). Then we applied GradientShap51 to estimate gene-level contributions to content and style on the CID44971 breast cancer dataset. Shapley values were computed at both the intermediate ReST layer and final layer, and genes were classified as content-related, style-related, or other based on interquartile range thresholds. Scatter plots (Supplementary Fig. 9b) show that content and style features are less distinguishable at the intermediate layer but become clearly separated at the last layer, indicating progressive disentanglement. Functional enrichment analyses (Supplementary Fig. 9c) reveal that content-related genes are enriched in tumor-intrinsic pathways (e.g., EMT and apoptosis), while style-related genes map to immune and microenvironmental processes, both are crucial for model learning and transfer. These results validate SpaIM’s capability to hierarchically learn and interpret biologically meaningful representations.SpaIM demonstrates strong performance and technical strengths, yet there remain several opportunities for future enhancement. First, SpaIM currently employs straightforward multi-layer perceptron (MLP) layers, which may potentially benefit from more sophisticated designs such as graph transformer52 and the cutting-edge Mamba53. Second, as SpaIM utilizes independent initial input values for the ST generator, there is potential for improvement by developing customized styles specifically tailored to distinct datasets. Then the ST generator can better understand and adapt to the unique characteristics and nuances of different spatial datasets, thereby enhancing the accuracy of spatial gene imputation. Third, we anticipate improving the interpretability of the SpaIM model to provide deeper insights into missing gene expressions and explore the underlying mechanisms in tissue ecosystems. In summary, SpaIM represents a significant advancement in spatial gene expression imputation and is anticipated to facilitate biological discoveries and insights into complex tissues and diseases.MethodsEthical statementThis study complies with all relevant ethical regulations. All data used were obtained from publicly available sources, and no new experiments involving human participants or animals were conducted. Approval from an institutional review board or ethics committee was not required.Data preprocessingWe include 28 datasets profiled from sequencing-based ST technologies, including 10× Visium, and 25 datasets from imaging-based ST technologies, such as seqFISH+, MERFISH, and NanoString CosMx SMI. For each ST dataset of ${G}_{1}$ genes and ${C}_{1}$ cells, we select a corresponding scRNA-seq dataset of ${G}_{2}$ genes and ${C}_{2}$ cells from the same tissue type as the reference with ${G}_{2}\gg {G}_{1}$. For SC data, low-expression genes are filtered out by retaining only those expressed in at least 10% of the cells. Raw counts of both SC and ST datasets are preprocessed using the $\log 1p$ transformation54,55, defined as $\log 1p(x)={\mathrm{ln}}(x+1)$. This transformation helps reduce data skewness and mitigates the impact of extreme values. The processed ST and SC gene expression data is denoted as ${X}_{{st}}\in {R}^{{G}_{1}\times {C}_{1}}$ and ${X}_{{sc}}\in {R}^{{G}_{2}\times {C}_{2}}$, respectively. The individual cell-level information in ${X}_{{sc}}$ carries significant noise due to dropouts, duplets, inaccurate cell segmentation, and RNA degradation. Instead of representing the gene expression of a gene in all ${C}_{2}$ cells, we employ Leiden clustering to group cells into $K$ clusters and use the median expression of the gene in each cluster to represent the gene. Thus, we derive ${\widetilde{X}}_{{sc}}\in {R}^{{G}_{2}\times K}$ from ${X}_{{sc}}$.To rigorously evaluate model performance, we use a 10-fold cross-validation strategy that incorporates random data partitioning and repeated validation. Specifically, the common genes shared by both ST and SC data are randomly split at the 80:20 ratio into the training set with $G$ genes to train the model, and the validation set with $G^{\prime}$ genes to evaluate the performance. The trained model will then be used to infer the spatial gene expression of the genes that are only available in the SC dataset but not in the ST.The SpaIM modelThe goal of SpaIM is to infer the expressions of genes that are only available in SC but not in the ST data, while keeping the gene expression patterns as if they were measured using the ST platform.1.Recursive style transfer (ReST) layerThe fundamental component of both the ST autoencoder and the ST generator is the ReST layer with layer-wise fusion. As shown in Supplementary Fig. 10a, the lth ReST layer consists of three components: a content encoder ${C}^{\left(l\right)}$, a style encoder ${S}^{\left(l\right)}$, and a decoder ${D}^{\left(l\right)}$. This layer performs two main functionalities: (1) it encodes the content representations (${h}^{\left(l\right)}$) and style representations (${g}^{\left(l\right)}$) into updated latent representations ${h}^{\left(l+1\right)}$ and ${g}^{\left(l+1\right)}$, respectively; and (2) it fuses these latent content and style representations to generate a reconstructed output ${p}^{\left(l\right)}$. The architecture of the ReST layer allows building recursive style transfer models through layer-wise feature extraction and fusion (Fig. 1a). The ReST layer is a general style transfer with versatile uses. For example, if the same data is used as the input of content as well as the style, with appropriate loss functions, the ReST layer becomes an autoencoder to disentangle content and style. If a virtual style is used, with appropriate loss functions, the ReST layer becomes a generative model to impose a style on the content. The ST autoencoder and the ST generator use such design strategies.2.Spatial transcriptomics (ST) autoencoderThe ST autoencoder (Fig. 1b) comprises multilayer ReST encoders. At each layer, the content encoder ${C}_{{st}}^{\left(l\right)}$ and the style encoder ${S}_{{st}}^{\left(l\right)}$ extract representation of the content and the style and a decoder ${D}^{\left(l\right)}$ for layer-wise fusions of the learned content and style, with $l={\mathrm{0,1}},\cdots,L$ representing different layers. Each encoder layer comprises a linear sub-layer, followed by a normalization sub-layer, and finally a ReLU sub-layer. The latent representation of the content in the lth layer is:$${h}_{{st}}^{\left(l\right)}={C}_{{st}}^{\left(l\right)}\left({h}_{{st}}^{\left(l-1\right)}\right)={\rm{ReLU}}\left({\rm{Norm}}\left({h}_{{st}}^{\left(l-1\right)}{{\boldsymbol{W}}}_{c}^{l}+{{\boldsymbol{b}}}_{c}^{l}\right)\right)$$(1)and the style representation is:$${g}_{{st}}^{\left(l\right)}={S}_{{st}}^{\left(l\right)}\left({g}_{{st}}^{\left(l-1\right)}\right)={\rm{ReLU}}\left({\rm{Norm}}\left({g}_{{st}}^{\left(l-1\right)}{{\boldsymbol{W}}}_{S}^{l}+{{\boldsymbol{b}}}_{S}^{l}\right)\right)$$(2)where ${h}_{{st}}^{\left(0\right)}\equiv {g}_{{st}}^{\left(0\right)}\equiv {X}_{{st}}$, with ${X}_{{st}}\in {R}^{G\times {C}_{1}}$ denoting the input spatial transcriptomics, $G$ is the number of genes, and ${C}_{1}$ is the number of cells.The decoder layers perform cascaded fusion of the latent representations of content and style, starting from the Lth layer (e.g., the last layer) till the first layer (i.e., $l=1$). The kth decoded layer is:$${p}_{{st}}^{\left(k\right)}={D}^{\left(k\right)}\left({h}_{{st}}^{\left(L-k+1\right)}\bigotimes {g}_{{st}}^{\left(L-k+1\right)}\right)\bigoplus {p}_{{st}}^{\left(k-1\right)}$$(3)where $\bigotimes$ refers to element-wise (Hadamard) multiplication between the latent content and style representations, meaning that each corresponding element in the two matrices is multiplied individually. The symbol $\bigoplus$ denotes element-wise addition, which adds each element of the current fusion output to the corresponding element in the previous output layer. $k={\mathrm{1,2}},\cdots,L$, and ${p}_{{st}}^{(0)}\equiv {\boldsymbol{0}}$. For example, when $L=2$,$${p}_{{st}}^{\left(1\right)}={D}^{\left(1\right)}\left({h}_{{st}}^{\left(2\right)}\bigotimes {g}_{{st}}^{\left(2\right)}\right)$$(4)and$${\widehat{X}}_{{st}}={p}_{{st}}^{\left(2\right)}={D}^{\left(2\right)}\left({h}_{{st}}^{\left(1\right)}{\bigotimes g}_{{st}}^{\left(1\right)}\right)\bigoplus {p}_{{st}}^{\left(1\right)}$$(5)where ${\widehat{X}}_{{st}}\in {R}^{G\times {C}_{1}}$ is the reconstructed ST data. The recursive architecture allows the latent representations of contents and styles as well as the reconstruction of the data at different resolutions, and the layer-wise fusion of the reconstructed data.3.Spatial transcriptomics (ST) generatorA similar architecture (Fig.1c) is used to generate ST data from SC data. Briefly, the latent representation of the content in the l’th layer is:$${h}_{{sc}}^{\left(l\right)}={C}_{{sc}}^{\left(l\right)}\left({h}_{{sc}}^{\left(l-1\right)}\right)={\rm{ReLU}}\left({\rm{Norm}}\left({h}_{{sc}}^{\left(l-1\right)}{{\boldsymbol{W}}}_{c}^{l}+{{\boldsymbol{b}}}_{c}^{l}\right)\right)$$(6)and the style representation is:$${g}_{{sc}}^{\left(l\right)}={S}_{{sc}}^{\left(l\right)}\left({g}_{{sc}}^{\left(l-1\right)}\right)={\rm{ReLU}}\left({\rm{Norm}}\left({g}_{{sc}}^{\left(l-1\right)}{{\boldsymbol{W}}}_{S}^{l}+{{\boldsymbol{b}}}_{S}^{l}\right)\right)$$(7)The genes in the input data varies in the training, validation, and inferring. During training, ${h}_{{sc}}^{\left(0\right)}\equiv {\widetilde{X}}_{{sc}}$, ${g}_{{st}}^{\left(0\right)}\equiv {\boldsymbol{1}}\in {R}^{G\times 1}$.The ST generator shares the decoder with the ST autoencoder. That is, the ST generator does not have its own decoder. Therefore, the kth decoded layer is:$${p}_{{sc}}^{\left(k\right)}={D}^{\left(k\right)}\left({h}_{{sc}}^{\left(L-k+1\right)}\bigotimes {g}_{{sc}}^{\left(L-k+1\right)}\right)\bigoplus {p}_{{sc}}^{\left(k-1\right)}$$(8)where ${p}_{{st}}^{(0)}\equiv {\boldsymbol{0}}$. For example, when $L=2$,$${p}_{{sc}}^{\left(1\right)}={D}^{\left(1\right)}\left({h}_{{sc}}^{\left(2\right)}\bigotimes {g}_{{sc}}^{(2)}\right)$$(9)and$${\widetilde{X}}_{{st}}\equiv {p}_{{sc}}^{\left(2\right)}={D}^{\left(2\right)}\left({h}_{{sc}}^{\left(1\right)}\bigotimes {g}_{{sc}}^{\left(1\right)}\right)\bigoplus {p}_{{sc}}^{\left(1\right)}$$(10)where ${\widetilde{X}}_{{st}}\in {R}^{G\times {C}_{1}}$ is the inferred ST data through style transfer and the final model output.During testing, ${h}_{{sc}}^{\left(0\right)}\equiv {\hat{X}}_{{sc}}\in {R}^{G{\prime} \times K}$, ${g}_{{st}}^{\left(0\right)}\equiv {\boldsymbol{1}}\in {R}^{G{\prime} \times 1}$, and ${\widetilde{X}}_{{st}}\in {R}^{G{\prime} \times {C}_{1}}$. During inferring, the SC gene expression of any SC gene can be used as the input to generate the corresponding spatial gene expression for each cell in the ST data.4.Loss functionThe loss function is composed of four components: (i) the content loss, (ii) the style loss, and (iii) two reconstruction losses for the ST autoencoder and the ST generator, respectively. The content loss and the style loss are used to disentangle content and style. The style loss also enables style transfer. The ST autoencoder reconstruction loss is for learning an enhanced ST data, which is used by the ST generator reconstruction loss for accurately inferring spatial gene expressions from SC data.Content lossContent loss is primarily utilized to optimize the similarity between the learned content at each encoding layer, aiding the model in preserving the essential information of the gene data. By minimizing the differences between content features, we ensure that the spatial gene data generated by the ST generator remains functionally consistent with the ST data. For the $L$ layers in the model, the content loss (${{\mathscr{L}}}_{{content}}$) of the latent space was the summation of the mean square error (MSE) between the lth content features from the spatial ${h}_{{st}}^{\left(l\right)}$ and scRNA dataset ${h}_{{sc}}^{\left(l\right)}$:$${{\mathscr{L}}}_{\rm{content}}=\mathop{\sum }\limits_{l=1}^{L}{\rm{MSE}}\left({h}_{{st}}^{\left(l\right)},{h}_{{sc}}^{\left(l\right)}\right)$$(11)Style lossGram matrices are used to represent the learned style at each layer, which is similar to the style representation in neural image style transfer frameworks56. The Gram matrix for the spatial style extracted from the ST data and the SC data at the lth layer are:$${M}_{{st}}^{\left(l\right)}={g}_{{st}}^{\left(l\right)}\times {\left({g}_{{st}}^{\left(l\right)}\right)}^{T}$$(12)and$${M}_{{sc}}^{\left(l\right)}={g}_{{sc}}^{\left(l\right)}\times {\left({g}_{{sc}}^{\left(l\right)}\right)}^{T}$$(13), and the style loss is:$${{\mathscr{L}}}_{\rm{style}}=\mathop{\sum }\limits_{l=1}^{L}{\rm{MSE}}\left({M}_{{st}}^{\left(l\right)},{M}_{{sc}}^{\left(l\right)}\right)$$(14)Reconstruction lossThe ST data reconstruction by the ST autoencoder is optimized by minimizing the difference between the original and the reconstructed ST data using cosine similarity:$${{\mathscr{L}}}_{{AE}}=1-{\rm{sim}}\left({X}_{{st}},{\hat{X}}_{{st}}\right)$$(15)where the cosine similarity is defined as ${\rm{sim}}\left(A,B\right)=\frac{A\cdot B}{\max ({\left|\left|A\right|\right|}_{2}{\left|\left|B\right|\right|}_{2},\epsilon )}$, $\epsilon$ is a small positive number for robustness. Similarly, the ST generator reconstruction loss is defined as:$${{\mathscr{L}}}_{{SG}}=1-{\rm{sim}}\left({\widetilde{X}}_{{st}},{\hat{X}}_{{st}}\right)$$(16)The overall loss$${\mathscr{L}}={{\mathscr{L}}}_{{content}}+{{\mathscr{L}}}_{{style}}+{{\mathscr{L}}}_{{AE}}+{{\mathscr{L}}}_{{SG}}$$(17)Thus, it optimizes the model’s training process from the aspects of feature similarity, style transformation, and data accuracy. The number of layers and latent dimensions of the SpaIM model are determined by grid-based fine-tuning across all datasets (Supplementary Fig. 10b, c). Ablation studies evaluating the impact of content loss, style loss and fusion operation are shown in Supplementary Note 5 (Supplementary Fig. 10d). The computational efficiency and robustness of SpaIM are further detailed in Supplementary Notes 6 and 7 (Supplementary Figs. 11 and 12).Evaluation metricsTo comprehensively evaluate the performance of different models, we have employed a set of evaluation metrics: Pearson correlation coefficient (PCC), structural similarity index (SSIM), root mean square error (RMSE), Jaccard similarity (JS), and an overall accurate score (ACC). These metrics collectively assess the similarity and accuracy between the generated spatial gene expression data and the original data from different perspectives.Pearson correlation coefficient (PCCPCC measures the linear correlation between the predicted and ground truth gene expression values. A higher PCC value indicates a stronger linear correlation between the generated and true data. The PCC is calculated as:$${\rm{PCC}}\left({x}_{i},{\widehat{x}}_{i}\right)=\frac{\mathop{\sum }\nolimits_{i=1}^{n}\left({\widehat{x}}_{i}-{\widehat{\mu }}_{i}\right)\left({x}_{i}-{\mu }_{i}\right)}{{\widehat{\sigma }}_{i}\cdot {\sigma }_{i}}$$(18)where ${\widehat{x}}_{i}$ and ${x}_{i}$ is the predicted and true expression of gene $i$, and $\left(\widehat{\mu },\widehat{\sigma }\right),(\mu,\sigma )$ are the mean and standard deviation of the predicted and true values, respectively.Structural similarity index measure (SSIM)SSIM evaluates the structural similarity between the predicted and the original data. It provides a more nuanced comparison between the predicted and true data. SSIM49 is calculated as:$${\rm{SSIM}}\left({x}_{i},{\widehat{x}}_{i}\right)=\frac{\left(2{{\widehat{\mu }}_{i}\mu }_{i}+{C}_{1}^{2}\right)\left(2\mathrm{cov}\left({{\mathbf{x}}}_{{\mathbf{i}}}^{{{{\prime} }}},{\hat{x}}_{i}^{{\prime} }\right)+{C}_{2}^{2}\right)}{\left({\mu }_{i}^{2}+{\widehat{\mu }}_{i}^{2}+{C}_{1}^{2}\right)\left({\sigma }_{i}^{2}+{\widehat{\sigma }}_{i}^{2}+{C}_{2}^{2}\right)}$$(19)$${x}_{{ij}}^{{\prime} }=\frac{{x}_{{ij}}}{\max \left(\left\{{x}_{i1},{x}_{i2},\ldots,{x}_{{iM}}\right\}\right)}$$(20)where ${x}_{{ij}}$ denotes the ground truth expression of gene $i$ in cell or spot $j$, and $M$ is the total number of cells or spots. ${{\boldsymbol{x}}}_{{\boldsymbol{i}}}^{{{{\prime} }}}$ and ${\hat{x}}_{i}^{{\prime} }$ represent the vectors of ground truth and predicted gene expression for gene i across all cells or spots, respectively.Root mean square error (RMSE)RMSE quantifies the average prediction error by measuring the differences between predicted and actual values. A lower RMSE indicates higher accuracy. The formula is:$${\rm{RMSE}}=\sqrt{\frac{1}{M}\mathop{\sum }\nolimits_{j=1}^{M}{\left({\hat{z}}_{{ij}}-{z}_{{ij}}\right)}^{2}}$$(21)where ${z}_{{ij}}$ and ${\hat{z}}_{{ij}}$ are the z-score normalized gene expression values of gene $i$ in cell or spot $j$. A lower RMSE means a higher accuracy.Jaccard similarity (JS)JS assesses the similarity, particularly focusing on the similarity of gene expression patterns, between the predicted and the original gene expression. First, spatial distribution probability for each gene is calculated as:$${P}_{{ij}}=\frac{{x}_{{ij}}}{\mathop{\sum }\nolimits_{j=1}^{M}{x}_{{ij}}}$$(22)Then JS is computed using the Jensen–Shannon divergence:$${JS}=\frac{1}{2}{KL}\left({\widehat{p}}_{i}\big|\frac{{\widehat{p}}_{i}+{P}_{i}}{2}\right)+\frac{1}{2}{KL}\left({P}_{i}\big|\frac{{\widehat{p}}_{i}+{P}_{i}}{2}\right)$$(23)$${KL}\left({a}_{i}|\big|{b}_{i}\right)=\mathop{\sum }\limits_{j=0}^{M}\left({a}_{{ij}}\times \log \left(\frac{{a}_{{ij}}}{{b}_{{ij}}}\right)\right)$$(24)where KL(.) is the Kullback-Leibler divergence between two probability distributions of ${a}_{{ij}}$ and ${b}_{{ij}}$.Accurate score (ACC)The ACC provides an overall performance score by combining the relative rankings of the four main metrics across all models: PCC, SSIM, RMSE, and JS. For each dataset, we rank PCC and SSIM in ascending order, and RMSE and JS in descending order. For example, the lowest PCC or SSIM receives a rank of 1, while the highest RMSE or JS also receives a rank of 1. ACC is then calculated as:$${\rm{ACC}}=\frac{1}{4}\left(\rm{RANK}_{{PCC}}+{RAN}{K}_{{SSIM}}+{RAN}{K}_{{RMSE}}+{RAN}{K}_{{JS}}\right)$$(25)Ranked metricsRanked Jaccard similarity (Ranked JS), Ranked Pearson correlation coefficient (Ranked PCC), Ranked structural similarity index measure (Ranked SSIM), and Ranked root mean square error (Ranked RMSE). For all these metrics, higher values indicate better model performance.Implementation detailDuring training, the model utilizes a learning rate of 0.001, runs for a maximum of 300 epochs. The experiments are performed on an Ubuntu 20.04 system equipped with 128 GB of RAM and an NVIDIA GeForce RTX 3090 Ti GPU featuring 24 GB of memory.Statistics and reproducibilityStatistical tests were unpaired, unless explicitly stated, and performed on independent biological replicates (as detailed in each Fig. legend).Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.Data availabilityAll datasets used in this study are publicly available in Zenodo (https://doi.org/10.5281/zenodo.16684835). Data sources and details are provided in Supplementary Data 1. Source data are provided with this paper. Correspondence and requests for materials should be addressed to B.Y., J.S., or Q.S. Source data are provided with this paper.Code availabilityAll source codes and data in our study have been deposited at https://github.com/QSong-github/SpaIM.ReferencesGiacomello, S. et al. Spatially resolved transcriptome profiling in model plant species. Nat. Plants 3, 17061 (2017).PubMed Google Scholar Berglund, E. et al. Spatial maps of prostate cancer transcriptomes reveal an unexplored landscape of heterogeneity. Nat. Commun. 9, 1–13 (2018).Google Scholar Thrane, K., Eriksson, H., Maaskola, J., Hansson, J. & Lundeberg, J. Spatially resolved transcriptomics enables dissection of genetic heterogeneity in stage III cutaneous malignant melanoma. Cancer Res. 78, 5970–5979 (2018).PubMed Google Scholar Asp, M. et al. A spatiotemporal organ-wide gene expression and cell atlas of the developing human heart. Cell 179, e1619 (2019).Google Scholar Maynard, K. R. et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. bioRxiv https://doi.org/10.1101/2020.02.28.969931 (2020).Moncada, R. et al. Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat. Biotechnol. 38, 333–342 (2020).PubMed Google Scholar Maniatis, S. et al. Spatiotemporal dynamics of molecular pathology in amyotrophic lateral sclerosis. Science 364, 89–93 (2019).ADS PubMed Google Scholar Ståhl, P. L. et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 353, 78–82 (2016).ADS PubMed Google Scholar Stickels, R. R. et al. Sensitive spatial genome wide expression profiling at cellular resolution. bioRxiv https://doi.org/10.1101/2020.03.12.989806 (2020).Rodriques, S. G. et al. Slide-seq: a scalable technology for measuring genome-wide expression at high spatial resolution. Science 363, 1463–1467 (2019).ADS PubMed PubMed Central Google Scholar He, S. et al. High-plex multiomic analysis in FFPE tissue at single-cellular and subcellular resolution by spatial molecular imaging. bioRxiv https://doi.org/10.1101/2021.11.03.467020 (2021).Fang, R. et al. Conservation and divergence of cortical cell organization in human and mouse revealed by MERFISH. Science 377, 56–62 (2022).ADS PubMed PubMed Central Google Scholar Song, Q., Su, J., Miller, L. D. & Zhang, W. scLM: automatic detection of consensus gene clusters across multiple single-cell datasets. bioRxiv https://doi.org/10.1101/2020.04.22.055822 (2020).Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).ADS PubMed PubMed Central Google Scholar Schaum, N. et al. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562, 367–372 (2018).ADS PubMed Central Google Scholar Papalexi, E. & Satija, R. Single-cell RNA sequencing to explore immune cell heterogeneity. Nat. Rev. Immunol. 18, 35–45 (2018).PubMed Google Scholar Song, Q. et al. Dissecting intratumoral myeloid cell plasticity by single cell RNA‐seq. Cancer Med. 8, 3072–3085 (2019).PubMed PubMed Central Google Scholar Bendall, S. C. et al. Single-cell trajectory detection uncovers progression and regulatory coordination in human B cell development. Cell 157, 714–725 (2014).PubMed PubMed Central Google Scholar Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).PubMed PubMed Central Google Scholar Biancalani, T. et al. Deep learning and alignment of spatially resolved single-cell transcriptomes with Tangram. Nat. Methods 18, 1352–1362 (2021).PubMed PubMed Central Google Scholar Abdelaal, T., Mourragui, S. M. C., Mahfouz, A. & Reinders, M. J. T. SpaGE: spatial gene enhancement using scRNA-seq. Nucleic Acids Res. 48, e107–e107 (2020).PubMed PubMed Central Google Scholar Lopez, R. et al. A joint model of unpaired data from scRNA-seq and spatial transcriptomics for imputing missing gene expression measurements. ICML Workshop on Computational Biology (2019).Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, e1821 (2018).Google Scholar Cang, Z. & Nie, Q. Inferring spatial and signaling relationships between cells from single cell transcriptomic data. Nat. Commun. 11, 2084 (2020).ADS PubMed PubMed Central Google Scholar Nitzan, M., Karaiskos, N., Friedman, N. & Rajewsky, N. Gene expression cartography. Nature 576, 132–137 (2019).ADS PubMed Google Scholar Welch, J. D., Kozareva, V., Ferreira, A. N., Vanderburg, C. & Macosko, E. Z. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell 177, e1817 (2019).Google Scholar Chen, S., Zhang, B., Chen, X., Zhang, X. & Jiang, R. stPlus: a reference-based method for the accurate enhancement of spatial transcriptomics. Bioinformatics 37, i299–i307 (2021).Google Scholar Li, K., Li, J., Tao, Y. & Wang, F. stDiff: a diffusion model for imputing spatial transcriptomics through single-cell transcriptomics. Briefings Bioinform. 25, 3 (2024).Wan, X. et al. Integrating spatial and single-cell transcriptomics data using deep generative models with SpatialScope. Nat. Commun. 14, 7848 (2023).ADS PubMed PubMed Central Google Scholar Sun, E. D., Ma, R., Navarro Negredo, P., Brunet, A. & Zou, J. TISSUE: uncertainty-calibrated prediction of single-cell spatial transcriptomics improves downstream analyses. Nat. Methods 21, 444–454 (2024).PubMed Google Scholar Sun, E. D., Ma, R. & Zou, J. SPRITE: improving spatial gene expression imputation with gene and cell networks. Bioinformatics 40, i521–i528 (2024).PubMed PubMed Central Google Scholar Gatys, L. A., Ecker, A. S. & Bethge, M. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2414–2423 (2016).Gatys, L. A., Bethge, M., Hertzmann, A. & Shechtman, E. Preserving color in neural artistic style transfer. Preprint at arXiv https://doi.org/10.48550/arXiv.1606.05897 (2016).Tang, Z. et al. SiGra: single-cell spatial elucidation through an image-augmented graph transformer. Nat. Commun. 14, 5618 (2023).ADS PubMed PubMed Central Google Scholar Guinn, S. et al. Transfer learning reveals cancer-associated fibroblasts are associated with epithelial–mesenchymal transition and inflammation in cancer cells in pancreatic ductal adenocarcinoma. Cancer Res. 84, 1517–1533 (2024).PubMed PubMed Central Google Scholar Chen, X., Yu, C., Kang, R. & Tang, D. Iron metabolism in ferroptosis. Front. Cell Dev. Biol. 8, 590226 (2020).ADS PubMed PubMed Central Google Scholar Tang, D., Chen, X., Kang, R. & Kroemer, G. Ferroptosis: molecular mechanisms and health implications. Cell Res. 31, 107–125 (2021).PubMed Google Scholar He, S. et al. High-plex imaging of RNA and proteins at subcellular resolution in fixed tissue by spatial molecular imaging. Nat. Biotechnol. 40, 1794–1806 (2022).PubMed Google Scholar Guo, C. C. et al. Dysregulation of EMT drives the progression to clinically aggressive sarcomatoid bladder cancer. Cell Rep. 27, e1784 (2019).Google Scholar Hinz, S. et al. Foxp3 expression in pancreatic carcinoma cells as a novel mechanism of immune evasion in cancer. Cancer Res. 67, 8344–8350 (2007).PubMed Google Scholar Duch, P. et al. Aberrant TIMP-1 overexpression in tumor-associated fibroblasts drives tumor progression through CD63 in lung adenocarcinoma. Matrix Biol. 111, 207–225 (2022).PubMed PubMed Central Google Scholar Chou, J. et al. MALAT1 induced migration and invasion of human breast cancer cells by competitively binding miR-1 with cdc42. Biochem. Biophys. Res. Commun. 472, 262–269 (2016).PubMed Google Scholar Itoh, T. et al. Experimental metastasis is suppressed in MMP-9-deficient mice. Clin. Exp. Metastasis 17, 177–181 (1999).PubMed Google Scholar Weiß, E. et al. Maternal overweight downregulates MME (neprilysin) in feto-placental endothelial cells and in cord blood. Int. J. Mol. Sci. 21, 834 (2020).Nizzoli, G. et al. Human CD1c+ dendritic cells secrete high levels of IL-12 and potently prime cytotoxic T-cell responses. Blood 122, 932–942 (2013).PubMed Google Scholar Breton, G. et al. Circulating precursors of human CD1c+ and CD141+ dendritic cells. J. Exp. Med. 212, 401–413 (2015).PubMed PubMed Central Google Scholar Thul, P. J. et al. A subcellular map of the human proteome. Science 356, eaal3321 (2017).Combaret, V. R. et al. Circulating MYCN DNA as a tumor-specific marker in neuroblastoma Patients1. Cancer Res. 62, 3646–3648 (2002).PubMed Google Scholar Li, B. et al. Benchmarking spatial and single-cell transcriptomics integration methods for transcript distribution prediction and cell type deconvolution. Nat. Methods 19, 662–670 (2022).PubMed Google Scholar Li, X., Zhu, F. & Min, W. SpaDiT: diffusion transformer for spatial gene expression prediction using scRNA-seq. Briefings Bioinform. 25, bbae571 (2024).Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 30, 4766–4777 (2017).Khan, S. et al. Transformers in vision: a survey. ACM Comput. Surv. 54, 1–41 (2022).Google Scholar Gu, A. et al. Mamba: linear-time sequence modeling with selective state spaces. Proc. Conf. on Language Modeling (COLM) (2025).Bao, F. et al. Integrative spatial analysis of cell morphologies and transcriptional states with MUSE. Nat. Biotechnol. 40, 1200–1209 (2022).PubMed Google Scholar Hu, J. et al. SpaGCN: integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat. Methods 18, 1342–1351 (2021).PubMed Google Scholar Gatys, L. A., Ecker, A. S. & Bethge, M. Image style transfer using convolutional neural networks. Proc. IEEE Conf. Comput. Vis. 2414–2423 (2016).Download referencesAcknowledgementsThis work partially used Jetstream2 through allocation CIS230237 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296.Author informationAuthor notesThese authors contributed equally: Bo Li, Ziyang Tang.Authors and AffiliationsDepartment of Computer and Information Science, University of Macau, Taipa, Macau, ChinaBo LiDepartment of Computer and Information Technology, Purdue University, West Lafayette, IN, USAZiyang Tang & Baijian YangDepartment of Biostatistics and Health Data Science, Indiana University School of Medicine, West Lafayette, IN, USAAishwarya Budhkar, Xiang Liu & Jing SuDepartment of Statistics, Purdue University, West Lafayette, IN, USATonglin ZhangDepartment of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USAQianqian SongAuthorsBo LiView author publicationsSearch author on:PubMed Google ScholarZiyang TangView author publicationsSearch author on:PubMed Google ScholarAishwarya BudhkarView author publicationsSearch author on:PubMed Google ScholarXiang LiuView author publicationsSearch author on:PubMed Google ScholarTonglin ZhangView author publicationsSearch author on:PubMed Google ScholarBaijian YangView author publicationsSearch author on:PubMed Google ScholarJing SuView author publicationsSearch author on:PubMed Google ScholarQianqian SongView author publicationsSearch author on:PubMed Google ScholarContributionsQ.S., J.S., and J.Y. supervised the overall project. Q.S. and B.L. drafted the paper and led the revision process. Z.T. and B.L. were responsible for data collection, model implementation and optimization, as well as performance benchmarking. A.B. conducted downstream analyses and prepared the figures. X.L. and T.Z. contributed to project discussions and assisted in refining the paper and visualizations. All authors reviewed and approved the final version of the paper.Corresponding authorsCorrespondence to Baijian Yang, Jing Su or Qianqian Song.Ethics declarationsCompeting interestsThe authors declare no competing interests.Peer reviewPeer review informationNature Communications thanks Wenwen Min and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.Additional informationPublisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Supplementary informationSupplementary InformationReporting SummaryDescription of additional supplementary filesSupplementary Data 1Transparent Peer review fileSource dataSource dataRights and permissionsOpen Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.Reprints and permissionsAbout this article