Integrative cross-sample alignment and spatially differential gene analysis for spatial transcriptomics

Wait 5 sec.

IntroductionIn recent years, spatial transcriptomics (ST) technologies have rapidly advanced, enabling large-scale and high-resolution profiling of gene expression while preserving spatial context within tissues1. Image-based methods such as seqFISH2, MERFISH3, and SMI4, along with sequencing-based approaches including 10x Visium5, Slide-seq6, and Stereo-seq7, have made it possible to capture spatial gene expression at cellular or even subcellular resolution. These technologies provide powerful tools for revealing spatial heterogeneity in tissue architecture and exploring gene functions within local microenvironments, thereby expanding the application of ST in developmental biology, cancer research, and disease mechanism studies.Despite these advances, integrated analysis of multi-sample ST data remains challenging. First, non-overlapping sampling regions are common across experiments, making spatial correspondence difficult to establish. Second, substantial spatial shifts and nonlinear distortions introduced during tissue processing render direct coordinate-based comparison infeasible. Third, conventional analysis workflows typically identify differentially expressed genes (DEGs) based on statistical differences in expression levels8, rather than cross-sample variation in space. As a result, genes that maintain similar overall expression but exhibit distinct spatial distributions across different conditions may be overlooked, leading to missed spatially informative signals. Accurate alignment of spatial slices across samples therefore requires jointly capturing both molecular similarity and spatial consistency.Several computational methods have been developed to address spatial alignment in ST data from different perspectives. PASTE9 employs a Gromov-Wasserstein optimal transport framework to align adjacent slices; SANTO10 applies a coarse-to-fine strategy using dynamic graph neural networks to identify overlapping regions; STAligner11 integrates and aligns multi-sample ST data via a graph-attention framework to obtain a spatially aware joint embedding; SPACEL12 is a deep-learning framework that identifies spatial domains across slices and supports 3D tissue reconstruction; STalign13 and ST-GEARS14 leverage large deformation diffeomorphic metric mapping (LDDMM)15 and elastic registration to better accommodate nonlinear deformations. Among these methods, PASTE, SANTO, STAligner and SPACEL primarily focus on rigid alignment and are less effective in handling complex tissue distortions. ST-GEARS, although capable of nonlinear alignment, suffers from high computational cost, limiting its scalability. More importantly, most existing methods focus solely on spatial alignment, without supporting downstream spatial analysis, thereby limiting their ability to characterize spatial expression patterns or condition-specific spatial variability across multiple ST slices. While CAST16 has begun to jointly consider alignment and spatially differential gene analysis, there remains a lack of consistent and scalable approaches for analyzing spatial heterogeneity across different conditions in multi-sample ST data. A recent benchmark study systematically evaluated these approaches17.In this study, we present CODA, short for “integrative CrOss-sample alignment and spatially Differential gene Analysis for spatial transcriptomics”. CODA addresses three key challenges—common domain identification, fast spatial alignment and spatially differential gene detection—by embedding multiple ST slices into a shared low-dimensional latent feature space. CODA first performs global alignment by jointly clustering the pooled feature space to define shared communities and identifying the nearest neighbors between slices within each community. Then from these cluster-restricted correspondences, CODA estimates an affine spatial transform. To improve the consistency of spatial region analysis, CODA incorporates a common domain identification module. It renders the low-dimensional representations as RGB images and extracts shared spatial regions by matching visual features between source and target slices. Cells within the common domain are reselected for subsequent alignment and analysis. It then performs local alignment at the coordinate level by applying multi-channel LDDMM to construct velocity fields in the latent space, learning smooth and invertible nonlinear transformations. Based on the aligned structure, we introduce a spatial cross-correlation metric to identify spatially differential genes (SDGs) across conditions. We systematically evaluate CODA on datasets generated from multiple ST platforms and show that it outperforms existing methods in terms of alignment accuracy, computational efficiency, and memory usage. Furthermore, in applications to vascular tissue and mouse brain data, CODA identifies key genes associated with disease states and sleep regulation, demonstrating its broad utility and biological relevance in ST research.ResultsOverview of CODACODA consists of two core components: spatial alignment and spatial analysis. The alignment module establishes a unified coordinate system across samples, providing a robust foundation for subsequent spatial comparisons. Specifically, CODA decomposes the alignment task into three stages: global alignment, common domain identification, and local alignment. By jointly transforming multiple ST datasets into multi-channel feature representations, these tasks are naturally reformulated as point cloud registration, image feature matching, and image registration, respectively. Once spatial alignment is completed, CODA introduces the concept of spatial cross-correlation to enable cross-sample spatial analysis of gene expression patterns based on the aligned coordinates.Specifically, CODA uses ST slices generated from either the same or different experimental platforms as inputs, including both sequencing-based (e.g., 10x Visium) and imaging-based (e.g., MERFISH) technologies (Fig. 1a). This cross-platform compatibility enables CODA to align and compare datasets with heterogeneous spatial resolutions. To eliminate batch effects between datasets, CODA first applies existing integration tools such as Seurat18, ComBat19 or STAligner11 to integrate gene expression across samples. The integrated expression matrix is then projected into a shared $d$-dimensional feature space (Fig. 1b). At this stage, each ST dataset can be interpreted as a 2D point cloud, where each point is embedded with a $d$-dimensional gene expression feature. Based on this representation, CODA first performs joint clustering in the common feature space across slices to obtain shared clusters (Fig. 1c). For each source point, the nearest neighbor in the target slice is identified within the same cluster, yielding the cluster-restricted correspondences. From these correspondences, CODA estimates a spatial transform under three modes: Mode I (rigid)—rotation and translation only, with the physical scale (μm-per-pixel) fixed, for serial-section reconstruction where absolute scale must be preserved; Mode II (similarity)—rotation, translation, and isotropic scaling, to compensate for uniform magnification differences across experiments or platforms; and Mode III (affine)—rotation, translation, anisotropic scaling, and shear—to absorb residual global distortions and improve cross-slice comparability for downstream spatial gene analyses (Fig. 1c; Supplementary Fig. 1).Fig. 1: Pipeline of CODA.Full size imagea Input data. CODA accepts two ST slices—source and target slices—from the same or different experimental platforms, which could be obtained by different sequencing technology (e.g., sequencing- or imaging-based) with diverse spatial resolutions. b Preprocessing. The gene expression matrices are batch-corrected and projected into a shared low-dimensional feature space. c Global alignment. Shared communities are obtained by joint clustering in the common feature space; within each community, the cross-slice nearest neighbors define cluster-restricted correspondences. From these correspondences, CODA estimates a spatial transformation in three modes—Mode I: rigid, Mode II: similarity, and Mode III: affine. d Resolution integration. Spatial locations from both slices are embedded into a shared coordinate grid to produce multi-channel low-dimensional representations. e Common domain identification. A transformer-based keypoint matching model is used to identify correspondences between slices, enabling extraction of a common spatial domain. f Local alignment. Nonlinear geometric transformations are estimated using multi-channel large deformation diffeomorphic metric mapping (LDDMM). g Spatial analysis. Based on aligned spatial coordinates, CODA computes a spatial cross-correlation index to identify spatially consistent genes (SCGs) and spatially differential genes (SDGs) across conditions.To identify a common spatial domain across slices, CODA renders the low-dimensional feature representations as multi-channel images, where each pixel encodes gene expression at a specific spatial location (Fig. 1d). When the feature dimension is 3, these representations can be interpreted as RGB images, allowing CODA to leverage image-based matching strategies. CODA adopts a transformer-based key-point matching strategy similar to LightGlue20, a lightweight attention model for robust correspondence detection. This enables common domain identification between slices via key point matching in the RGB-like representations (Fig. 1e). CODA then performs local alignment using multi-channel LDDMM on the constructed spatial representations, allowing for the alignment of nonlinear deformations and the unification of spatial scales across samples (Fig. 1f). To mitigate potential overcorrection, we apply a gain-based QC gate and perform the local alignment only when it provides a non-trivial improvement; otherwise, we fall back to global-only alignment.To make cross-sample analysis of gene expression more interpretable, CODA introduces a spatial cross-correlation metric that quantifies the similarity of each gene’s spatial distribution across aligned slices. Based on this metric, genes with high spatial cross-correlation are defined as spatially consistent genes (SCGs), often reflecting conserved localization patterns and known cell-type markers. In contrast, genes with low or negative spatial cross-correlation are defined as spatially differential genes (SDGs), capturing condition-specific spatial shifts that may reflect underlying biological variation. This framework enables the systematic identification of genes whose spatial organization is preserved or altered between conditions, facilitating downstream biological interpretation (Fig. 1g; see Methods for details).BenchmarkTo comprehensively evaluate CODA, we structured the benchmark into two parts: (i) rigid alignment and (ii) non-rigid distortion. For the rigid-alignment, we compared PASTE, SANTO, STalign, ST-GEARS, STAligner, SPACEL, and CODA on biologically adjacent slice pairs across 10x Visium, MERFISH, and Stereo-seq, and included a scale-sensitivity analysis on 10x Visium dataset. For fair comparison, all methods were run in rigid mode. Because STalign lacks a purely rigid option, we used its default configuration. For the non-rigid-distortion, we applied progressively stronger affine perturbations and evaluated the recovery under a capability-aligned protocol using rigid-only methods in rigid mode and nonrigid-capable methods with their recommended nonrigid module. Alignment accuracy, runtime, and memory were evaluated (Supplementary Tables 1–5; preprocessing in Supplementary Note 1).First, we compared the performance of different methods on the 10x Visium human dorsolateral prefrontal cortex (DLPFC) dataset21, a sequencing-based technology with spot-level resolution. We evaluated a pair of slices (ID151673 and ID151676) from an adult donor. We considered both the normalized coordinate system (each axis rescaled to [−1, 1]) and the original scale. In both conventions, we manually rotated ID151673 by ${180}^{\circ }$ around the origin to simulate rotational deviation. To quantify alignment quality, we calculated the expression correlation score (ECS), the spatial consistency index (SCI), the rotation recovery score (RRS), and spot-to-spot mapping metrics (one-to-one coverage and crowding ratio). In addition, for the DLPFC dataset where the laminar order is available, we computed the layer-wise alignment accuracy (LAA), which generalizes SCI by allowing small layer offsets (with SCI equal to ${LA}{A}_{\le 0}$).Under normalized coordinates (each axis rescaled to [−1,1]), pre-alignment baselines were found to be low: ECS = 0.30, SCI = 0.18, RRS = 0.00, ${LA}{A}_{\le 1}=0.383$, ${LA}{A}_{\le 2}=0.588$ (Supplementary Table 1). Among all methods, STalign failed to improve alignment quality, likely due to its reliance on single-cell–resolution data for estimating cell density fields (Fig. 2a). The overall performance rarely improved across different metrics (Fig. 2d and Supplementary Table 1). PASTE and SANTO partially corrected the misalignment (Fig. 2a) but still deviated from the true orientation, yielding moderate ECS (0.32, 0.34), moderate SCI (0.35, 0.37), suboptimal RRS (0.732, 0.758), and mid-range layer-tolerant accuracies (${{{{\rm{LAA}}}}}_{\le 1}=0.699,\,0.705$). STAligner further improved orientation (RRS = 0.94) and expression (ECS = 0.37), with ${{{{\rm{LAA}}}}}_{\le 1}=0.943$. By contrast, CODA, ST-GEARS, and SPACEL recovered the applied rotation (RRS = 0.99) and achieved the highest ECS among the compared methods (all 0.37). Their SCI values were 0.88, 0.88, and 0.87, respectively, and their layer-tolerant accuracies were ${{{{\rm{LAA}}}}}_{\le 1}=1.000,\,0.998{{{\rm{and}}}}0.999$.Fig. 2: Global alignment benchmarking of CODA, PASTE, STalign, SANTO, ST-GEARS, STAligner and SPACEL on various ST datasets.Full size imagea Alignment results on the Visium DLPFC dataset. The two slices differ by 180 degrees prior to alignment. CODA and ST-GEARS achieved the highest SCI (0.88), with SPACEL close behind (SCI = 0.87). STAligner yields partial alignment (SCI = 0.66), while PASTE (SCI = 0.35) and SANTO (SCI = 0.37) leave clear angular bias. STalign, not designed for spot-level data, failed to align the slices. Colors represent cortical layers. Throughout all experiments, slice 2 consistently served as the source slice and slice 1 as the target. b Alignment results on the MERFISH mouse brain dataset. The two slices also differ by 180 degrees. Due to memory constraints, PASTE, SANTO, ST-GEARS and STAligner were run on 10,000-cell subsets, whereas CODA, STalign and SPACEL processed the full 100,000 cells. SPACEL achieves the highest SCI (0.33), followed by CODA (0.30) and ST-GEARS (0.27). PASTE (0.20) and STAligner (0.18) retain slight residual angular error; SANTO introduced a translation shift, and STalign performed poorly without landmarks. Red and blue indicate cell positions from slice 1 and slice 2. c Alignment results on the Stereo-seq maize dataset. SPACEL has the highest SCI (0.42); CODA (0.41), SANTO (0.40) and STAligner (0.39) are close. PASTE and STalign made minimal adjustments, and ST-GEARS incorrectly applied a 90-degree rotation. Colors represent different cell types. d Alignment on 10x Visium DLPFC under slight affine distortion. CODA is the only method that reliably corrects this non-rigid distortion. Δ is the absolute difference between mapped and reference cortical-layer indices (Δ = 0 exact match; Δ = 1 off by one layer; Δ > 1 mismatch of two or more layers). e Alignment accuracy metrics, spot-to-spot mapping metrices and complexity metrics for all methods of different dataset. DLPFC dorsolateral prefrontal cortex, SCI spatial consistency index, ECS expression correlation score, RRS rotation recovery score. Source data are provided as a Source Data file.On the original pixel scale, the results were qualitatively consistent with the normalized case. Notably, PASTE aligned substantially better on the native scale, reflecting its distance-weighted objective’s sensitivity to coordinate magnitude, whereas other methods performed similarly across scales (Supplementary Fig. 2a).We next evaluated CODA on a large-scale MERFISH dataset of the mouse brain22, an image-based ST technology with single-cell resolution. Two spatially adjacent slices (C57BL6J-638850.38/.40) had coordinates normalized to [−1,1]. The untransformed baseline was ECS = 0.44 and SCI = 0.31. We then applied a ${180}^{\circ }$ rotation to C57BL6J-638850.38 (with the ECS and SCI dropped to 0.32 and 0.18) and benchmarked methods. Given the > 100,000 cells per slice, PASTE, SANTO, ST-GEARS and STAligner exceeded memory limits (> 32 GB RAM). To ensure comparability, we uniformly subsampled 10,000 cells per slice for these methods.Surprisingly, STalign failed to correct the rotation and instead applied local warping (ECS = 0.32; SCI = 0.17; RRS = 0). These results suggest that STalign might strongly depend on external landmarks, which were not provided in our setup. PASTE and STAligner partially corrected orientation (RRS = 0.85/0.70) but left ~28°/54° residuals and lower expression/label consistency (ECS = 0.36/0.33; SCI = 0.20/0.18). SANTO and ST-GEARS recovered orientation (RRS = 0.998/0.985); SANTO incurred a small translation (ECS = 0.30; SCI = 0.14), while ST-GEARS scored higher (ECS = 0.41; SCI = 0.27) but introduced a horizontal flip. SPACEL performed best (RRS = 1.00; ECS = 0.45; SCI = 0.33) but required manual labels. Using expression only, CODA corrected rotation and approached SPACEL (RRS = 0.974; ECS = 0.43; SCI = 0.30) (Fig. 2b; Supplementary Table 2).We evaluated CODA on a subcellular-resolution maize dataset23 (Stereo-seq) using two replicates (rep1 and rep2). These slices exhibited natural, unannotated spatial misalignment; therefore, RRS was not computable.Among the tested methods, ST-GEARS applied an incorrect ~90° rotation, indicating difficulty in rotational calibration on high-resolution, plant-derived samples (Fig. 2c). PASTE and STalign made only minimal adjustments, resulting in poor alignment quality. In contrast, CODA, SANTO, STAligner and SPACEL produced the strongest alignments, with high expression/label consistency (ECS = 0.70, 0.70, 0.70, 0.71; SCI = 0.41, 0.40, 0.39, 0.42) and near one-to-one correspondences (one-to-one coverage = 0.896, 0.885, 0.886, 0.884; crowding ratio = 1.069, 1.084, 1.081, 1.079), indicating robust performance on high-resolution, plant-derived tissue (Fig. 2c and Supplementary Table 3).To extend the benchmark beyond rigid motion, we introduced controlled nonrigid perturbations—first global affine changes and then smooth elastic warps. Only CODA, STalign, and ST-GEARS natively support nonrigid alignment, and the remaining methods were run in rigid mode for reference.Under a slight affine perturbation, CODA was the only method that reliably restored both global orientation and local correspondences, achieving SCI = 0.886 and one-to-one = 0.805 (Fig. 2d and Supplementary Table 4). The next-best methods reached at most SCI = 0.776 (SPACEL) and one-to-one = 0.731 (ST-GEARS). Under a stronger affine perturbation, this advantage became more significant (CODA’s SCI = 0.887, while the best SCI of other methods is 0.592, Supplementary Fig. 2b and Supplementary Table 5). CODA’s global alignment module consistently corrected affine distortions but could not resolve the nonlinear deformations, as expected. We then used CODA’s local alignment module to recover the elastic warps (Supplementary Fig. 3).We also tested a combined workflow in which CODA served as the alignment backend within the integration framework STAligner, showing additional performance gains (Supplementary Figs. 4 and 5).On the uniformly subsampled 10,000-cell MERFISH set, CODA finished in 12.4 s with only +113 MB peak CPU memory, outperforming PASTE (950 s, GPU), SANTO (1225 s), ST-GEARS (4930 s), and STAligner (9613 s). On the full dataset, CODA completed in 510 s versus 1058 s for SPACEL—even though the other competitors were evaluated only on the 10,000-cell subset. STalign was also light on time/memory but failed to correct rotation, underscoring CODA’s image-based design that balances accuracy, speed, and scalability (see Supplementary Tables 1–3 and “Implementation Environment” section of the Methods).CODA enables robust intra- and cross-technology alignment and 3D reconstruction of serial ST sectionsTo comprehensively evaluate CODA’s performance in both robust alignment across technologies and 3D reconstruction from serial sections, we applied it to ST data under homogeneous (intra-technology) and heterogeneous (cross-technology) settings. For alignment, we tested five conditions: (1) two mouse hemisphere slices from 10x Visium24, (2) two from STARmap25, (3) two from RIBOmap26, (4) a cross-technology comparison between 10x Visium24 and MERFISH22, and (5) a multi-slice alignment task involving five Visium slices from the same dataset as (1). In addition, we reconstructed serial 3D tissue stacks for a mouse primary motor cortex27, a low-plex mouse brain block28, and an early mouse embryo29. These datasets span broad ranges of spatial resolution, throughput, and molecular modalities, providing a comprehensive benchmark for assessing CODA’s robustness and generalizability (Supplementary Figs. 6 and 7).Because local diffeomorphic alignment can, in principle, overfit inter-slice differences, we introduced a pseudo-slice negative control and corresponding QC assessment before running the main experiments. Specifically, from a single real slice we generated pseudo-slice pairs by overlapping subsampling two partially overlapping subsets at sampling fractions $p\in \{0.8,0.6,0.3,0.15\}$ (Supplementary Fig. 8). Under this setting, we report deformation readouts of the LDDMM-estimated diffeomorphism including displacement magnitude, Jacobian determinant, and stretch/compression severity to characterize conservative deformation ranges in a scenario with no true inter-slice non-rigid warp (Supplementary Note 2 and Supplementary Fig. 9). We further evaluated the pseudo-slice negative control using four pseudo-slice pairs generated with independently sampled unequal fractions. These additional controls showed the same overall pattern of mild and regular deformation, supporting the robustness of the local alignment procedure under heterogeneous coverage conditions (Supplementary Figs. 10 and 11). In addition, we provide a scale-reference example: at $p=0.6$, we apply a rigid rotation to one pseudo-slice and run local-only alignment without global initialization, which yields substantially larger deformation readouts and thus calibrates the magnitude of “mild” deformations observed in the negative control (Supplementary Fig. 12). Building on this, we implement a gain-based QC gate that accepts local alignment only when it provides a non-trivial net improvement in nearest-neighbor label consistency ($\Delta r\ge \tau$, with default $\tau=0.02$); otherwise, CODA falls back to global-only alignment (Supplementary Fig. 13). We report this gating outcome throughout the main analyses and retain the local alignment for downstream analysis only when the QC gate is satisfied.We first examined alignment within the same technology. In the 10x Visium slices, which displayed a strong angular discrepancy, CODA applied global rotation correction followed by nonlinear refinement (Fig. 3a). The SCI increased from 0.30 (unaligned) to 0.44 after global alignment, and further to 0.53 after local alignment.Fig. 3: Local alignment of ST slices across technologies using CODA.Full size imagea Alignment of two 10x Visium slices. Prior to alignment, the slices exhibited substantial angular and structural differences. CODA corrected the rotation and refined local deformations, resulting in improved spatial consistency. Colors represent different cell types. b Alignment of two STARmap slices. Despite minimal initial misalignment, local registration slightly improved spatial consistency across brain regions (e.g., Alveus, Corpus Callosum, Cortex). Colors denote annotated regions. c Alignment of two RIBOmap slices. The slices displayed significant rotational and shape discrepancies. CODA resolved both via global and local alignment, substantially improving spatial overlap. d Cross-technology alignment between 10x Visium and MERFISH slices. These datasets differ in resolution, cell count, and spatial sampling. CODA progressively aligned the Visium slice to match MERFISH in global shape, enabling downstream spatial comparison. e Evolution of spatial alignment for the gene Lamp5. Using the MERFISH slice as reference, cells in Visium were iteratively shifted based on expression similarity, yielding shape convergence. f Spatial alignment of five Visium slices. Prior to alignment, the slices exhibited substantial spatial misalignment. After applying CODA, the slices were brought into consistent spatial registration. g 3D stacks before alignment (“Original”, top; no inter-slice registration) and after CODA (bottom) for STARmap serial sections. Left: whole stack colored by cell type; right: representative types shown separately (L2/3, L4, L5 excitatory neurons; Olig1, Olig2). h Multi-view renderings of the CODA-aligned STARmap stack. Left: two viewpoints of the cell-type stack. Right: representative genes (Cux2, Flt1, Fos, Plxcd2, Slc17a7) colored by values of expression. Source data are provided as a Source Data file.In the STARmap dataset, which provides single-cell-resolution spatial transcriptomic data, the two slices exhibited minimal angular discrepancy, resulting in negligible changes after global alignment. However, following local alignment, the spatial consistency index (SCI) improved from 0.32 to 0.43, demonstrating CODA’s capacity to refine fine-scale spatial correspondence (Fig. 3b).In contrast, the RIBOmap dataset exhibited severe rotational distortion. The initial SCI was just 0.02, which increased to 0.19 after rotation correction and to 0.24 after nonlinear alignment (Fig. 3c), confirming CODA’s ability to resolve large deformations in high-resolution spatial data.Cross-technology alignment presents greater challenges due to differences in resolution and expression modality. To test CODA’s robustness under such conditions, we aligned mouse hemisphere slices from 10x Visium and MERFISH (Fig. 3d). Using the 490 shared genes for co-embedding, CODA progressively aligned the spatial structure of the Visium slice to match that of MERFISH. Although direct region-to-region matching was infeasible due to inconsistent annotations—especially the widespread presence of OPC-Oligo cells in MERFISH—we observed that post-alignment shapes were visually concordant. To validate the effect of refined alignment at the expression level, we computed Moran’s I scores for each gene in the Visium dataset and visualized the top-ranked spatially variable genes before and after alignment (Supplementary Figs. 14 and 15). The spatial expression patterns became more consistent across datasets after applying CODA, underscoring its ability to improve both structural and molecular correspondence.To assess CODA’s scalability and robustness in multi-slice alignment, we applied it to five consecutive slices from the Visium mouse brain dataset, sourced from the same sample as the one-to-one alignment in the first experiment. CODA jointly aligned all five slices by sequentially applying global and local transformations in a reference-consistent manner (Fig. 3f). Using the first slice (ST8059048) as the reference, SCI with the remaining four slices increased from 0.39, 0.29, 0.30, and 0.14 (unaligned) to 0.51, 0.41, 0.41, and 0.37 (after alignment), respectively.Having established CODA’s alignment performance across homogeneous and heterogeneous settings and its scalability to multi-slice alignment, we next examined whether these gains enable reconstruction of a coherent 3D tissue architecture from consecutive sections. For 3D reconstruction, we adopt an anchor-based serial stacking strategy by composing adjacent global rigid transforms into a common reference frame. We implemented two anchor choices (the first slice or the middle slice) and selected the final anchor based on the SCI/ECS drift profiles that compare slice 1 against slice $j$ as a function of slice distance $\Delta$ (Supplementary Fig. 16), with the terminal pair (slice 1 vs. slice $N$) included as an end-to-end check. For the longer MERFISH stack, agreement decays more noticeably with increasing distance, and the middle-anchor provides slightly higher long-range SCI/ECS; whereas for STARmap and Stereo-seq (shorter stacks), the two anchors show similar trends with only modest differences, and we use the first-anchor as the default when it yields slightly higher values. We first evaluated this on STARmap. We built a baseline stack by centering each slice (no inter-slice alignment) and compared it with the CODA-aligned stack using the previously introduced SCI and ECS. Across the nine adjacent slice pairs, SCI increased from a mean of 0.284 (median 0.276) in the centered baseline to 0.932 (median 0.950) after CODA (mean ΔSCI = 0.648, Supplementary Fig. 17). In parallel, ECS rose from a mean of 0.330 (median 0.324) to 0.663 (median 0.673) (mean ΔECS = 0.332). These pairwise, monotonic gains indicate that CODA enhances both structural continuity (types align slice-to-slice) and molecular coherence (expression profiles remain correlated across depth). Consistent with these metrics, layers such as L4 and L5, which appear fragmented or offset in the centered baseline, form continuous, properly stratified sheets after CODA (Fig. 3g). Likewise, genes such as Cux2 and Flt1 display smoother, colinear contours and preserved banding across adjacent slices in the CODA stack (Fig. 3h).We next extended the 3D reconstruction test to consecutive sections from MERFISH (mouse brain) and Stereo-seq (mouse embryo). Using the same centered-stack baseline for comparison, CODA again produced coherent 3D stacks with clear gains in SCI and ECS (Supplementary Fig. 17), recovering expected layer-level continuity in MERFISH and smooth, anatomically aligned embryonic domains in Stereo-seq (Supplementary Fig. 18).CODA identifies spatially consistent genes (SCGs) and spatially differential genes (SDGs) across different ST slicesTraditional single-cell analysis often overlooks genes that exhibit distinct spatial patterns despite comparable global expression levels (Fig. 4a). To address this, we introduced a spatial cross-correlation metric to quantify the similarity of gene expression patterns across ST samples. We first applied CODA to a 10x Visium human artery dataset30 (Fig. 4b), comprising slices from normal and atherosclerotic tissues. Nearest-neighbor analysis showed that smooth muscle cells in the diseased artery were most frequently matched to fibroblasts in the normal artery before alignment (430 cell pairs), but to smooth muscle cells after alignment (631 cell pairs; Fig. 4g). Using the aligned coordinates, we then identified two distinct gene sets: spatially consistent genes (SCGs) that maintained conserved localization across conditions, and spatially differential genes (SDGs) that exhibited marked spatial redistribution (Fig. 4c, d). Notably, SCGs often corresponded to canonical cell-type markers, whereas SDGs revealed spatial reorganization associated with disease progression.Fig. 4: Spatial alignment and identification of spatially differential genes in arterial tissue.Full size imagea Schematic illustration of spatially differential gene expression. The top row shows vessel structures from a normal artery and an atherosclerotic artery, where wall damage and immune cell infiltration cause visible distortion in the latter. The middle row shows artificially aligned slices, emphasizing shape similarity. The bottom row highlights a spatially differential gene: although expression levels are comparable across slices, its spatial distribution differs—accumulating near the damaged wall in the diseased sample but remaining diffuse in the healthy tissue. b Overlay of arterial slices before and after alignment. Misalignment between the atherosclerotic and normal tissues is visible before alignment (left) and corrected by CODA (right). c Spatial expression of genes with low spatial cross-correlation (SDGs), reflecting condition-specific distribution patterns. Notable examples include CCL2, ACE, QKI, and HCLS1. d Spatial expression of genes with high spatial cross-correlation (SCGs), which reflects conserved localization patterns across slices. Examples include TAGLN (rank 2, smooth muscle cells), DCN (rank 11, fibroblasts) and S100B (rank 39, Schwann cells). e Spatial distributions of smooth muscle cells, fibroblasts, and Schwann cells, corresponding to marker patterns shown in Fig.4d. f CAD and Normal sections stained for α-SMA (green), GSN/HSPB1 (red), and DAPI (blue). GSN co-localizes with α-SMA and is confined to the smooth-muscle layer in both conditions; HSPB1 is enriched in the inner smooth-muscle layer near the endothelium and within/around α-SMA cells. Scale bars, 50 µm. g Sankey diagram showing nearest-neighbor cell-type correspondences between CAD and normal artery slices before and after alignment. Before alignment, many SMCs were closest to fibroblasts in the other slice. After alignment, SMCs were more consistently matched. h Functional enrichment analysis of SCGs in atherosclerotic arterial tissue, including GO (BP, CC, MF) and KEGG pathway analysis. SDGs spatially differential genes, SCGs spatially consistent genes, CAD coronary artery disease, APC adipocyte progenitor cells, EC endothelial cells, SMCs smooth muscle cells, GO Gene Ontology, BP Biological Process, CC Cellular Component, MF Molecular Function, KEGG Kyoto Encyclopedia of Genes and Genomes. Source data are provided as a Source Data file.To assess the biological relevance of SCGs, we examined the top-ranked genes by spatial cross-correlation index and observed that many corresponded to canonical markers of major cell types in arterial tissue (top 2%, full list in Supplementary Table 6). For example, TAGLN for smooth muscle cells (ranked 2), DCN for fibroblasts (ranked 11), APOD for adipocytes (ranked 4), CD36 for macrophages (ranked 36), and S100B for Schwann cells (ranked 39) all exhibited high spatial cross-correlation, indicating conserved spatial localization across samples (Fig. 4d and Supplementary Figs. 19 and 20). Consistent with these exemplars, SCGs were strongly enriched for curated marker sets of the major arterial cell types, including endothelial cells, smooth muscle cells (and related subclasses), adipocytes, fibroblasts, and myofibroblasts. As an orthogonal validation, we performed dual-color immunofluorescence. Gelsolin (GSN) co-localized with α-smooth muscle actin (α-SMA) in both atherosclerotic and control arteries, remaining confined to the smooth muscle cells, while HSPB1 expression was mainly observed in the endothelial cell layer adjacent to the α-SMA–positive smooth muscle layer.To evaluate functional coherence, we applied complementary enrichment analyses to SCGs. GSEA yielded early-peaking, positive running-score curves, with top signals in cytoplasmic translation and contractile/cytoskeletal programs (muscle system process, muscle contraction, actin filament–based process, actin cytoskeleton organization). Cellular Component terms emphasized extracellular vesicle/exosome, extracellular matrix, focal adhesion, and cell–substrate junction; Molecular Function highlighted structural molecule activity, protein-complex binding, and actin filament binding. Consistently, KEGG enriched vascular smooth muscle contraction, regulation of actin cytoskeleton, tight junction, and focal adhesion, with additional enrichment in ribosome-related and PPAR signaling pathways. Over-representation analysis (ORA) recapitulated these themes (Supplementary Fig. 21).In contrast, SDGs—those with low spatial cross-correlation—exhibited significant differences in spatial distribution across slices (Supplementary Table 7). Notably, this divergence may not always be accompanied by large changes in gene expression levels, making such genes difficult to detect using traditional DEG analysis. Among the lowest-ranked genes by spatial cross-correlation (Fig. 4c, Supplementary Fig. 22), we identified several with known links to atherosclerosis. For instance, CCL2 (ranked 2999) is a chemokine involved in monocyte/macrophage recruitment and immune-inflammatory responses31; ACE (ranked 2947) is highly expressed in lipid-rich macrophages and regulates both vascular tone and lipid metabolism via PPARα signaling32. Other low-ranking genes include MKI67 (ranked 3000), LY96 (ranked 2955), and CTSD (ranked 2977), all of which have been implicated in atherosclerosis-related processes33,34,35.To further assess whether these genes are captured by standard expression-based methods, we performed Wilcoxon rank-sum differential gene expression analysis comparing atherosclerotic (n = 1123 cells) and normal (n = 818 cells) samples, and computed z-scores, approximate log2 fold-changes and Benjamini-Hochberg-adjusted p-values. Although ACE had the highest log2 fold-change among the SDGs examined (0.47; adjusted $p=0.89$), it did not reach significance. For the other highlighted SDGs, the largest statistically significant absolute log2 fold-change was 0.14 for CCL2 (adjusted $p=2.83\times {10}^{-47}$), which remains well below thresholds typically used in DEG analyses. These results underscore CODA’s ability to uncover spatially distinct gene regulation patterns that are invisible to expression-level comparisons alone.Spatial gene analysis is robust to grid resolution and expression rescalingTo assess whether CODA’s spatial gene identification is driven by spatial organization rather than absolute expression, and whether it remains stable under parameter changes, we designed three experiments: (i) a spatial sampling/representation sensitivity test, including a grid-resolution sweep (varying the discretization) and a within-slice local sampling-density adjustment, and comparing genome-wide rankings and set concordance; (ii) a quintile-based expression-dependence check, comparing SCG/SDG rates to the 5% null expectation; and (iii) structure-controlled perturbations of baseline SCGs.Across seven grid discretizations (30 × 30–60 × 60, Fig. 5a), pairwise overlaps showed high cross-scale consistency for SCGs (median Jaccard 0.714, IQR 0.667–0.724, range 0.622–0.786, Fig. 5b). For SDGs, overlaps were reproducible yet more resolution-sensitive (median 0.186; IQR 0.158–0.220; range 0.115–0.240). A consensus analysis was carried out to evaluate robustness. The SCG curve exceeded the 95% random envelope from $t\ge 2$ and still retained 98 genes across all seven grids, despite selecting only 150 SCGs per grid (Fig. 5c). SDGs likewise rose above random for $t\ge 2$ but decayed faster—an expected behavior for spatial differential signals, which hinge on scale-dependent contrasts and boundary effects across samples—while still maintaining a nontrivial shared core (Fig. 5c). Consistently, regressing out a kNN-based local sampling-density covariate within each slice had negligible impact on SCC rankings (Pearson ${r}=\,0.995$; Spearman $\rho \,=\,0.995$) and preserved SCG/SDG calls (SCG 54/60; SDG 55/60; Supplementary Note 3 and Supplementary Fig. 23).Fig. 5: Spatial gene analysis is robust to grid resolution and insensitive to expression magnitude.Full size imagea Cross-scale test. CODA alignment and spatial cross-correlation were computed across seven grid discretizations, with representative maps shown for three resolutions (30 × 30, 45 × 45, 60 × 60). Example maps for TAGLN and DCN show preserved spatial patterns across grids. b Pairwise Jaccard overlap of SCG sets across grids shows high cross-scale concordance (median 0.714; IQR 0.667–0.724). c Consensus analysis of set stability. The number of genes present in at least t grids is plotted for SCGs and SDGs against a 95% random envelope. d, e Distribution of cross-correlation index across pooled expression quintiles. Shared genes passing the expression/detection filtering steps were ranked by pooled expression and divided into five quintiles. In (d), the dashed horizontal line indicates the SCG threshold; in (e), the dashed horizontal line indicates the SDG threshold. Percentages above each box indicate the fraction of genes in each quintile that fall above the SCG threshold (d) or below the SDG threshold (e). Each dot represents one gene. Center lines indicate medians, boxes indicate the 25th and 75th percentiles, and whiskers indicate the minimum and maximum values. f Structure-preserving downscaling (binomial thinning) reduces counts while keeping spatial organization. g Structure-agnostic upscaling (location-independent Poisson resampling) raises counts but destroys spatial organization. h Circular Sankey diagrams of rank transitions for baseline SCGs (top: downscaling; bottom: upscaling). In each diagram, the upper semicircle shows the original spatial cross-correlation rank bands (Top 5%, 5–20%, 20–50%, last 50%); the lower semicircle shows the new ranks of the same genes after perturbation. Link widths are proportional to the number of genes moving between bands. Abbreviations: SCGs, spatially consistent genes; SDGs, spatially differential genes; Q1–Q5, the five pooled-expression quintiles from low to high pooled expression. Source data are provided as a Source Data file.To examine whether SCGs and SDGs exhibit expression-level bias, we performed a quintile analysis on the artery dataset. In this analysis, SCGs showed a clear high-expression skew. Across pooled expression quintiles, SCG rates were strongly depleted in Q1–Q4 (0.5–1.5%; Fig.5d) and markedly enriched in Q5 (20.7%; Fig. 5d). Wilson 95% CIs were well below/above the 5% expectation accordingly. By contrast, SDG rates hovered around the 5% baseline with CIs overlapping it across bins, aside from a mild depletion in the highest-expression bin (≈3.7%; Fig. 5e), indicating no material dependence on expression level.To test whether this apparent high-expression skew reflects an intrinsic expression-driven bias in SCG calling, we performed two complementary, structure-controlled perturbations on the baseline SCG set. In the structure-preserving downscaling experiment, we applied independent binomial thinning per spot for each SCG (see Methods for details); coordinates and all non-SCG columns were unchanged (Fig. 5f). Despite substantial reductions in counts, the SCG membership (top 5% by $I(g)$) remained identical to baseline (Fig. 5h). In contrast, structure-agnostic upscaling replaced each SCG by increasing mean expression while destroying spatial organization. Under this perturbation, $I(g)$ collapsed toward the null and none of these genes remained in the SCG top 5% (Fig. 5h).Taken together, CODA’s SCG calls are governed by spatial organization rather than absolute expression. The high-expression skew observed in the quintile check therefore does not itself imply a metric preference for highly expressed genes; instead, it likely arises in practice from differential noise robustness. Under standard count models, ${{{\rm{Var}}}}\left[X\right]=\mu$(Poisson) or $\mu+\phi {\mu }^{2}$ (NB), giving coefficient of variation ${{{\rm{C}}}}{{{{\rm{V}}}}}^{2}=1/\mu \,$(Poisson) or ${{{\rm{C}}}}{{{{\rm{V}}}}}^{2}=1/\mu+\phi \,({{{\rm{NB}}}})$, so low-expression genes carry higher relative stochastic noise, thereby degrading the detectability of their spatial patterns in real measurements.CODA extracts common domains and identifies spatially consistent genes across slicesTo further demonstrate CODA’s ability to resolve spatial variation and identify meaningful biological patterns, we applied it to mouse brain coronal slices generated by two ST platforms: CosMx Spatial Molecular Imager (SMI)4 and 10x Genomics Visium36. The SMI dataset included two slices—one spanning a full hemisphere and the other focused on the hippocampus and cortex—while the Visium dataset contained coronal slices covering the cortex, hippocampus, and hypothalamus.Accurate slice alignment requires sufficient overlap in both gene expression and spatial coverage. However, in practice, technical variation and experimental inconsistencies often lead to substantial differences in the detected tissue areas, particularly across platforms or batches. This issue is evident in the SMI dataset, where the two slices covered markedly different areas (17.1 mm2 and 33.68 mm2), with only partial spatial overlap (Fig. 6a). Unlike many existing methods that require fully overlapping tissue areas, CODA is able to robustly extract and align a common domain even under partial spatial overlap. By visualizing both slices as RGB-encoded spatial representations and applying a transformer-based feature matching strategy, CODA detects and aligns corresponding keypoints across slices (Fig. 6b). This process enables CODA to define a common domain between the two slices (Fig. 6b), which was further confirmed through visual inspection (Fig. 6c). Extracting this shared region led to marked improvements in downstream alignment: the ECS and SCI increased from 0.41 and 0.05 to 0.51 and 0.16, respectively.Fig. 6: Identification of common domains and spatial analysis in mouse brain datasets.Full size imagea Two ST slices generated using the CosMx Spatial Molecular Imager (SMI) platform. The coverage areas of the two slices partially overlap. b Identification of the common domain between the two SMI slices using spatial feature matching. c Extracted common domain from the two SMI slices. d Original mouse brain coronal slices from the 10x Visium dataset. e Identification of the common domain between the two Visium slices. f Extracted common domain from the Visium slices. g Common spatial domain identified after alignment. Compared to the unaligned slices, the extracted domains exhibit stronger overlap across samples. h Genes with high spatial cross-correlation index, showing consistent spatial expression patterns across slices. i Functional enrichment analysis of SCGs in mouse brain dataset, including GO (BP, CC, MF) and KEGG pathway analysis. SMI Spatial Molecular Imager, SCGs spatially consistent genes, GO Gene Ontology, BP Biological Process, CC Cellular Component, MF Molecular Function, KEGG Kyoto Encyclopedia of Genes and Genomes. Source data are provided as a Source Data file.To further evaluate the role of spatial cross-correlation in identifying genes with distinct spatial patterns across conditions, we applied CODA to a Visium-based mouse brain dataset designed to study the effects of Rhynchophylline (RHY) treatment. This dataset includes multiple coronal slices sampled under different doses (50 mg/kg and 100 mg/kg), time points, and biological sexes (details in the Supporting Information). In the saline group at ZT4, we observed a substantial region missing from one of the slices, resulting in a spatial mismatch with the corresponding RHY-treated slice (Fig. 6d). CODA successfully extracted and aligned the common domain between the two slices (Fig. 6e–g), enabling a meaningful spatial comparison.After alignment, we computed spatial cross-correlation and ranked all genes (Supplementary Table 8). High–cross-correlation genes (e.g., Baiap3, Nrgn) showed conserved spatial patterns across slices and included known markers such as Lamp5 and Itpka (Fig. 6h). Because RHY does not disrupt brain architecture, sleep genes Hcrt37 and Pmch38 changed in expression yet retained stable localization (ranks 6 and 27). By contrast, low–cross-correlation genes often displayed noisy or nonspecific spatial patterns despite expression shifts (Supplementary Figs. 24 and 25). Marker-set enrichment supported a cell-type basis: SCGs were over-represented for neuronal markers—particularly interneuron subclasses (including Lamp5) and excitatory neuron programs. Consistent with this, functional enrichment highlighted neuropeptide hormone activity, neuroactive ligand–receptor interaction, and hormone signaling—pathways that encompass core sleep–wake modulators (e.g., orexin/Hcrt and MCH/Pmch)—in line with the preserved localization of Hcrt and Pmch (Supplementary Fig. 26).CODA enables robust alignment across diverse species and organsTo further evaluate the generalizability of CODA across different biological contexts, we applied CODA to four datasets spanning distinct species and organs.We first applied CODA to an ST dataset of soybean root slices39. CODA significantly improved inter-slice spatial consistency with the spatial consistency index (SCI) increasing from 0.45 to 0.53, indicating enhanced alignment of gene expression structures across slices.We next applied CODA to gestational uterine tissues undergoing placentation40 and mouse embryonic tissues7, which exhibit complex organogenesis and layered tissue structure. In the uterine dataset, CODA corrected both global and local spatial discrepancies, resulting in an increase in SCI from 0.18 to 0.56. In the embryonic dataset, the two slices initially showed notable mismatches in tissue-to-tissue correspondence, as evident from the disorganized off-diagonal patterns in the nearest-neighbor heatmap (Fig. 7d, before alignment). After alignment, the slices reached a high degree of morphological concordance (Fig. 7c), with clear one-to-one correspondence across major anatomical regions (Fig. 7d, post-alignment), and the SCI improved substantially from 0.20 to 0.68.Fig. 7: CODA generalizes across diverse biological systems and species.Full size imagea Spatial alignment of soybean root slices before and after applying CODA. b Alignment of gestational uterine tissues undergoing placentation before and after CODA alignment. c Alignment of mouse embryonic slices before and after applying CODA. d Heatmaps showing nearest-neighbor cell-type correspondences between two embryonic slices before and after alignment. Prior to alignment, mismatches across anatomical regions resulted in off-diagonal patterns. After alignment, the heatmap approaches a diagonal structure, indicating improved anatomical correspondence. e Alignment of injured axolotl telencephalon slices before and after applying CODA. f Spatial expression patterns of top-ranked genes by spatial cross-correlation. These genes include known tissue-specific markers and genes previously implicated in regeneration. Source data are provided as a Source Data file.We further applied CODA to regenerating axolotl telencephalon tissue at 2 days post injury (2 DPI)41. CODA successfully aligned slices from the injured telencephalon, improving SCI from 0.36 to 0.46, and enabling robust identification of spatially consistent genes across sections. Among the top-ranked spatially consistent genes were GFAP, FABP7, and VIM, well-established markers of radial glia and reactive astrocytes, as well as GAD1 and GAD2, which label inhibitory interneurons such as sstIN and scgnIN in the annotated cell types (Fig. 7f, Supplementary Figs. 27, 28, and Supplementary Table 9). In addition, we observed high spatial concordance for TRH, EDNRB, SFRP1, RGCC, and PTN—genes previously implicated in neuroendocrine signaling, neural crest–derived cell lineages, and injury response pathways (Fig. 7f, Supplementary Figs. 27, 28, and Supplementary Table 9). These findings suggest that CODA is capable of capturing conserved spatial programs associated with both cell-type-specific architecture and regenerative molecular cues, even in highly dynamic and heterogeneous post-injury contexts.DiscussionIn this study, we propose CODA, a scalable and computationally efficient framework for spatial alignment and cross–sample analysis in spatial transcriptomics. CODA factorizes alignment into a global stage (rigid/similarity/affine) followed by local diffeomorphic refinement (LDDMM), improving slice-to-slice structural continuity and molecular coherence in serial sections and thereby supporting 3D reconstruction. CODA also identifies shared spatial domains across samples, allowing simultaneous alignment and downstream analyses within biologically comparable regions. In addition, CODA introduces a spatial cross-correlation index, enabling the detection of condition-specific spatial expression patterns beyond traditional expression–level comparisons.Although several methods also include an “alignment” step, many (e.g., STAligner) primarily focus on integration/joint embeddings; their registration typically relies on mutual nearest neighbors in the embedding followed by rigid ICP, and is not designed for broader non-rigid deformations. CODA is complementary rather than exclusive to such methods: when helpful, it can ingest external embeddings/co–clusters to stabilize correspondences while retaining a geometry–centric global–to–local alignment backbone. To interface seamlessly with CV keypoint/domain matching and LDDMM, we apply a data–driven rasterization to the learned features, enabling LightGlue-like matchers and LDDMM on multi–channel images; we also provide, in Supplementary Note 4, a theoretical statement of first–order insensitivity of LDDMM to convex interpolation on zero-valued pixels, constraining interpolation error and supporting this design choice.Despite its strengths, CODA has several limitations. First, for SCGs, the spatial cross-correlation index is theoretically invariant to expression scale, but low–expression genes are practically more susceptible to sequencing/capture noise, which can erode spatial structure. Second, regarding reproducibility, we confirm cross–platform non–determinism in UMAP: even with identical versions and fixed seeds, approximate nearest-neighbor search and multithreaded numeric libraries can introduce small differences in the neighbor graph, which in turn can slightly affect global–alignment initialization and downstream SCG/SDG ranks. Although reproducibility tests indicate that SCG/SDG identification is overall robust, this caveat should still be borne in mind in practical analyses. Third, while CODA supports spatial differential analysis, its integration with other layers of spatial omics—such as spatial proteomics or chromatin accessibility profiling (e.g., spatial ATAC–seq)—has yet to be explored. Recent advances in multimodal spatial data integration42,43,44 offer promising directions for extending CODA’s analytical capabilities across molecular layers.Moreover, an increasing number of computational frameworks have been developed recently for inferring cell–cell communication (CCC) in spatial transcriptomics45,46,47. Integrating CODA with such CCC inference methods may enable deeper insights into spatial signaling dynamics, tissue organization, and pathology. In addition, combining CODA with gene regulatory network inference or spatial epigenomic profiling may allow for a more comprehensive understanding of spatial gene regulation. Integrating CODA with generative models could enable estimation of missing intermediate tissue sections, and fully unsupervised group–wise alignment and automatic anchor selection will improve automation and robustness. In summary, CODA serves as a generalizable and extensible framework for spatial data integration and analysis, helping to unlock the full potential of spatial omics in complex biological systems.MethodsAlignment detailsNotationsWe denote a ST slice as $({{{\bf{X}}}},\,{{{\bf{Y}}}})$, where ${{{\bf{X}}}}\in {{\mathbb{R}}}^{G\times N}$ represents the gene expression matrix of cells or spots, and ${{{\bf{Y}}}}\in {{\mathbb{R}}}^{D\times N}$ represents the spatial coordinate matrix. The constants $G$, $N$ and $D$ represent the number of genes, the number of cells/spots and the dimensionality of spatial coordinates (usually $D=2\,{{{\rm{or}}}}\,3$ in physical space), respectively. In the alignment task, we have a source slice $({{{{\bf{X}}}}}_{{{{\rm{S}}}}},\,{{{{\bf{Y}}}}}_{{{{\rm{S}}}}})$ and a target slice $({{{{\bf{X}}}}}_{{{{\rm{T}}}}},\,{{{{\bf{Y}}}}}_{{{{\rm{T}}}}})$, where ${{{{\bf{X}}}}}_{{{{\rm{S}}}}}\in {{\mathbb{R}}}^{{G}_{{{{\rm{S}}}}}\times {N}_{{{{\rm{S}}}}}}$ has ${G}_{{{{\rm{S}}}}}$ genes and ${N}_{{{{\rm{S}}}}}$ cells/spots while ${{{{\bf{X}}}}}_{{{{\rm{T}}}}}\in {{\mathbb{R}}}^{{G}_{{{{\rm{T}}}}}\times {N}_{{{{\rm{T}}}}}}$ has ${G}_{{{{\rm{T}}}}}$ genes and ${N}_{{{{\rm{T}}}}}$ cells/spots. When multiple slices require alignment, we typically designate one slice as the target and sequentially align the remaining slices to this target slice in a pairwise manner.Data preprocessingWe first select the common genes between the source and target slices and integrate the gene expression matrices ${{{{\bf{X}}}}}_{{{{\rm{S}}}}}$ and ${{{{\bf{X}}}}}_{{{{\rm{T}}}}}$ using methods such as Seurat, ComBat, BBKNN or STAligner. We recommend Seurat or ComBat, which can remove batch effect and adjust gene expression levels across different datasets effectively, and STAligner for its strong integrative performance by explicitly incorporating spatial information. The modified gene expression matrices after integration are denoted as ${\widetilde{{{{\bf{X}}}}}}_{{{{\rm{S}}}}}$ and ${\widetilde{{{{\bf{X}}}}}}_{{{{\rm{T}}}}}$. Then, we concatenate the gene expression matrices and use UMAP to reduce the dimensionality to $d$ dimensions:$$\left({{{{\bf{z}}}}}_{1},{{{{\bf{z}}}}}_{2}, \ldots ,{{{{\bf{z}}}}}_{{{{\rm{d}}}}}\right)={{{\rm{UMA}}}}{{{{\rm{P}}}}}_{d}\left({{{\rm{concat}}}}\left({\widetilde{{{{\bf{X}}}}}}_{{{{\rm{S}}}}},{\widetilde{{{{\bf{X}}}}}}_{{{{\rm{T}}}}}\right)\right).$$(1)Subsequently, we normalize the $d$-dimensional data to [0,255]:$$\left({{{{\bf{z}}}}}_{{{{\rm{S}}}}}^{{{{\rm{i}}}}},{{{{\bf{z}}}}}_{{{{\rm{T}}}}}^{{{{\rm{i}}}}}\right)=255\times \frac{{{{{\bf{z}}}}}_{{{{\rm{i}}}}}-\min \left({{{{\bf{z}}}}}_{{{{\rm{i}}}}}\right)}{\max \left({{{{\bf{z}}}}}_{{{{\rm{i}}}}}\right)-\min \left({{{{\bf{z}}}}}_{{{{\rm{i}}}}}\right)},$$(2)where $\min \left({{{{\bf{z}}}}}_{{{{\rm{i}}}}}\right)$ and $\max ({{{{\bf{z}}}}}_{{{{\rm{i}}}}})$ $(i={1,2} , \ldots , {{{\rm{d}}}})$ are the minimum and maximum values of the ${ith}$ dimension, and subscripts ${{{\rm{S}}}}$ and ${{{\rm{T}}}}$ represent the “source slice” and the “target slice”, respectively. As a result, we obtain low-dimension representations for each cell/spot in the ST data.Global alignmentAfter preprocessing, each cell/spot in source or target slice has $d$ features ${{{{\bf{z}}}}}_{{{{\rm{i}}}}}$ and two spatial coordinates ${x}_{i}$ and$\,{y}_{i}$. We denoted the source and target point sets by ${{{\mathcal{S}}}}={\left\{{{{{\bf{p}}}}}_{i}\right\}}_{i=1}^{{N}_{{{{\rm{S}}}}}}={\left\{\left({x}_{i}^{{{{\rm{S}}}}},{y}_{i}^{{{{\rm{S}}}}},{{{{\bf{z}}}}}_{{{{\rm{i}}}}}^{{{{\rm{S}}}}}\right)\right\}}_{i=1}^{{N}_{{{{\rm{S}}}}}}$ and ${{{\mathcal{T}}}}={\left\{{{{{\bf{q}}}}}_{j}\right\}}_{j=1}^{{N}_{{{{\rm{T}}}}}}={\left\{\left({x}_{j}^{{{{\rm{T}}}}},{y}_{j}^{{{{\rm{T}}}}},{{{{\bf{z}}}}}_{{{{\rm{j}}}}}^{{{{\rm{T}}}}}\right)\right\}}_{j=1}^{{N}_{{{{\rm{T}}}}}}$, respectively. Pool the shared features $\left\{{{{{\bf{z}}}}}_{i}^{{{{\rm{S}}}}}\right\}\,\bigcup \,\left\{{{{{\bf{z}}}}}_{j}^{{{{\rm{T}}}}}\right\}$ and assign cluster labels by Louvain community detection on the pooled feature space (used by default; the procedure is modular and compatible with alternative clustering methods), yielding $c\left({{{{\bf{p}}}}}_{i}\right),{c}\left({{{{\bf{q}}}}}_{j}\right)\in \left\{{{{{\mathcal{C}}}}}_{1} , \ldots , {{{{\mathcal{C}}}}}_{L}\right\}$. For each sample ${{{{\bf{p}}}}}_{i} \, {{\rm{with}}} \, {{\rm{c}}} \, ({\boldsymbol{\ p}}_i) \in {{{{\mathcal{C}}}}}_{l}$ in the source slice, we find its nearest neighbor ${{{\bf{q}}}}(i)$ within the same cluster ${{{{\mathcal{C}}}}}_{l}$ in the feature space ${{{\mathcal{S}}}}$ and ${{{\mathcal{T}}}}$:$${{{\bf{q}}}}(i)={{{{\rm{argmin}}}}}_{{{{{\bf{q}}}}}_{j}}\left({\sum}_{k=1}^{d}{\left({z}_{{ik}}^{{{{\rm{S}}}}}-{z}_{{jk}}^{{{{\rm{T}}}}}\right)}^{2}\right),\,c\left({{{{\bf{q}}}}}_{j}\right)\in {{{{\mathcal{C}}}}}_{l}.$$(3)Global alignment has three modes. Mode I seeks a rigid transformation between the source and target by minimizing the overall distance between samples in the source slice and their nearest neighbors in the target slice in the feature space:$$\hat{{\boldsymbol{{\mathcal{R}}}}},\hat{{{\mathcal{t}}}}={{{\rm{argmin}}}}_{{\boldsymbol{{\mathcal{R}}}},{{{\mathcal{t}}}}} {\sum }_{i}{{\left|\left|{\left({x}_{q\left(i\right)}^{{{\rm{T}}}},{y}_{q\left(i\right)}^{{{\rm{T}}}}\right)}^{\top }-\left({{\boldsymbol{{\mathcal{R}}}}}\cdot {\left({x}_{i}^{{\rm{S}}},{y}_{i}^{{\rm{S}}}\right)}^{\top }+{{{\mathcal{t}}}}\right)\right|\right|}}^{2},{{\rm{s}}}.{{\rm{t}}}.\,{{\boldsymbol{{\mathcal{R}}}}}^{\top }{{\boldsymbol{{\mathcal{R}}}}}={{\bf{I}}},\det \left({{\boldsymbol{{\mathcal{R}}}}}\right)=1,$$(4)where $I$ denotes the identity matrix and ${(\cdot)}^{\top }$ the transpose. Center the source points and their matches in physical space ${\widetilde{{{{\bf{p}}}}}}_{i}={{{{\bf{p}}}}}_{i}-\bar{{{{\bf{p}}}}},\,{\widetilde{{{{\bf{q}}}}}}_{i}={{{\bf{q}}}}(i)-\bar{{{{\bf{q}}}}},$ with $\bar{{{{\bf{p}}}}}=\frac{1}{n}{\sum }_{i}{{{{\bf{p}}}}}_{i}$ and $\bar{{{{\bf{q}}}}}=\frac{1}{n}{\sum }_{i}{{{\bf{q}}}}(i)$. Form the cross-covariance matrix ${{{\bf{W}}}}=\mathop{\sum }_{i}{\widetilde{{{{\bf{p}}}}}}_{i}{\left({\widetilde{{{{\bf{q}}}}}}_{i}\right)}^{\top }$, and compute its SVD decomposition ${{{\bf{W}}}}={{{\bf{U}}}}{{{\mathbf{\Sigma }}}}{{{{\bf{V}}}}}^{\top }$, the final rotational transformation $\hat{{{\boldsymbol{{\mathcal{R}}}}}}$ and a translational transformation $\hat{{{\mathcal{t}}}}$ are represented by $\hat{{{\boldsymbol{{\mathcal{R}}}}}}={{{\bf{V}}}}{{{{\bf{U}}}}}^{\top }$ and $\hat{{{\mathcal{t}}}}=\bar{{{{\bf{q}}}}}-\hat{{{\boldsymbol{{\mathcal{R}}}}}}\bar{{{{\bf{p}}}}}$, respectively.Mode II seeks a similarity transformation $T({{{\bf{x}}}})=s{{{\boldsymbol{{\mathcal{R}}}}}}{{{\bf{x}}}}+{{{\mathcal{t}}}}$ by augmenting the rigid model with an isotropic scale $s$:$$\hat{{\boldsymbol{{\mathcal{R}}}}},\hat{{{\mathcal{t}}}},\hat{s}=\mathop{{{\rm{argmin}}}}_{{\boldsymbol{\mathcal {R}}},{{{\mathcal{t}}}},s}{\sum }_{i}{{\bigg|\bigg|}{\left({x}_{q\left(i\right)}^{{{\rm{T}}}},{y}_{q\left(i\right)}^{{{\rm{T}}}}\right)}^{\top }-\left(s{{\boldsymbol{{\mathcal{R}}}}}\cdot {\left({x}_{i}^{{{\rm{S}}}},{y}_{i}^{{{\rm{S}}}}\right)}^{\top }+{{{\mathcal{t}}}}\right){\bigg|\bigg|}}^{2},{{\rm{s}}}.{{\rm{t}}}.\,{{{\boldsymbol{{\mathcal{R}}}}}}^{\top }{{{\boldsymbol{{\mathcal{R}}}}}}={{\bf{I}}},\det \left({{{\boldsymbol{{\mathcal{R}}}}}}\right)=1,$$(5)By Umeyama48, the rotation is obtained by the same step as in Mode I, the scale and the translation are $\hat{s}=\frac{{tr}({{{\mathbf{\Sigma }}}})}{{\sum }_{i}{{||}\,{\widetilde{{{{\bf{p}}}}}}_{i}{||}}_{2}^{2}}$ and $\hat{{{\mathcal{t}}}}=\bar{{{\bf{q}}}}-\hat{s}\hat{{{{\boldsymbol{{\mathcal{R}}}}}}}\bar{{{\bf{p}}}}$, respectively.Mode III seeks an affine transformation by ridge-regularized least squares on the label-restricted correspondences. Let ${{{\bf{X}}}}=\left[{\left({x}_{i}^{{{{\rm{S}}}}},{y}_{i}^{{{{\rm{S}}}}}\right)}^{\top }\right]\in {{\mathbb{R}}}^{{N}_{{{{\rm{S}}}}}\times 2}$ and ${{{\bf{Y}}}}=\left[{\left({x}_{q(i)}^{{{{\rm{T}}}}},{y}_{q(i)}^{{{{\rm{T}}}}}\right)}^{\top }\right]\in {{\mathbb{R}}}^{{N}_{{{{\rm{S}}}}}\times 2}$, define the augmented design ${{{{\bf{X}}}}}_{{{{\rm{aug}}}}}=\left[{{{\bf{X}}}}\,{{{\mathbf{1}}}}\right]\in {{\mathbb{R}}}^{{N}_{{{{\rm{S}}}}}\times 3}$ and the parameter matrix ${{\mathbf{\Theta }}}=\left[\begin{array}{c}{{{\boldsymbol{{\mathcal{A}}}}}}^{\top }\,\\ {{{\boldsymbol{{\mathcal{B}}}}}}^{\top }\end{array}\right]\in {{\mathbb{R}}}^{3\times 2}$ for $T({{{\bf{x}}}})={{\boldsymbol{{\mathcal{A}}}}}{{{\bf{x}}}}+{{\boldsymbol{{\mathcal{B}}}}}$. With a Tikhonov term that does not penalize translation and anchors the linear part at the similarity solution, we estimate:$$\hat{{{\mathbf{\Theta }}}}={{{{\rm{argmin}}}}_{{{\mathbf{\Theta }}}}{||}{{{\bf{X}}}}_{{{\rm{aug}}}}{{\mathbf{\Theta }}}-{{\bf{Y}}}{||}}_{F}^{2}+\lambda {{||}{{\bf{P}}}\left({{\mathbf{\Theta }}}-{{{\mathbf{\Theta }}}}_{0}\right){||}}_{F}^{2},\,{{\bf{P}}}={{\rm{diag}}}\left({\mathrm{1,1,0}}\right),{{{\mathbf{\Theta }}}}_{0}=\left[\begin{array}{c}{\left(\hat{s}\,\hat{{{\boldsymbol{{\mathcal{R}}}}}}\right)}^{\top }\\ {0}^{\top }\,\end{array}\right],$$(6)which admits the closed form:$$\hat{{{\mathbf{\Theta }}}}={\left({{{\bf{X}}}}_{{{\rm{aug}}}}^{{{\top }}}{{{\bf{X}}}}_{{{\rm{aug}}}}+\lambda {{{\bf{P}}}}^{{{\top }}}{{\bf{P}}}\right)}^{-1}\left({{{\bf{X}}}}_{{{\rm{aug}}}}^{{{\top }}}{{\bf{Y}}}+\lambda {{{\bf{P}}}}^{{{\top }}}{{\bf{P}}}{{{\mathbf{\Theta }}}}_{0}\right),$$(7)yielding the aligned coordinates ${\left\{T\left({{{{\bf{p}}}}}_{i}\right)\right\}}_{i}$.3D reconstruction of serial sectionsFor serial-section datasets, we reconstruct a 3D stack by applying the global rigid alignment (Mode I) between adjacent slices and expressing all slices in a common reference frame. Given an ordered list of slices $\{1 , \ldots , N\}$, we estimate a rigid transform ${g}_{s\to s+1}({{{\bf{x}}}})={{{{\boldsymbol{{\mathcal{R}}}}}}}_{s\to s+1}{{{\bf{x}}}}+{{{{\mathcal{t}}}}}_{s\to s+1}$ for each adjacent pair $(s,s+1)$ using Mode I, and obtain the transform from slice s to an anchor slice a by composing the adjacent transforms along the chain. CODA provides two anchor choices: (i) first-anchor ($a=1$), where all slices are mapped to slice 1; and (ii) middle-anchor $(a={{{\rm{\lfloor }}}}N/2{{{\rm{\rfloor }}}})$, where slices on both sides are mapped to the middle slice by composing transforms inward. We recommend using the middle-anchor strategy when the number of slices is large, as it reduces the average chain length and can mitigate drift accumulation in long stacks; for short stacks, both anchors typically give similar reconstructions. After mapping each slice into the anchor frame, we assign a depth coordinate ${z}_{s}=(s-1)\Delta z$ (with $\Delta z$ being the inter-slice spacing when available, otherwise a constant step for visualization) and obtain 3D coordinates for downstream visualization and analysis.Integration of resolutionAfter the global alignment is done, we aim to generate the representation matrices ${{{{\bf{I}}}}}_{{{{\rm{S}}}}}$ and ${{{{\bf{I}}}}}_{{{{\rm{T}}}}}$ for the source and target slice in the same spatial resolution. We first choose a joint canvas of size $n\times m$ for both slices with a light occupancy heuristic: given the two coordinate sets ${\left\{\left({x}_{i}^{{{{\rm{S}}}}},{y}_{i}^{{{{\rm{S}}}}}\right)\right\}}_{i=1}^{{N}_{{{{\rm{S}}}}}}$ and ${\left\{\left({x}_{j}^{{{{\rm{T}}}}},{y}_{j}^{{{{\rm{T}}}}}\right)\right\}}_{j=1}^{{N}_{{{{\rm{T}}}}}}$, we estimate, for each slice, a minimal height-width pair $({n}^{\left(\cdot \right)},{m}^{(\cdot )})$ that keeps the expected number of points per pixel below a target level (default 1.0), with an optional safety gap $g$ between neighboring occupied pixels. We then set $n=\max \left({n}^{\left({{{\rm{S}}}}\right)},{n}^{\left({{{\rm{T}}}}\right)}\right)$ and $m=\max \left({m}^{\left({{{\rm{S}}}}\right)},{m}^{\left({{{\rm{T}}}}\right)}\right)$ and initialize ${{{{\bf{I}}}}}_{{{{\rm{S}}}}}$ and ${{{{\bf{I}}}}}_{{{{\rm{T}}}}}$ as $n\times m\times d$ zero matrices. Given a set of spatial coordinates ${\mathbb{C}}{\mathbb{=}}{\left\{\left({x}_{i},{y}_{i}\right)\right\}}_{i=1}^{N}$ and corresponding $d$-dimensional feature values ${{{\mathcal{R}}}}={\left\{({z}_{i}^{1},{z}_{i}^{2} , \ldots , {z}_{i}^{d})\right\}}_{i=1}^{N}$, we normalize and scale the coordinates $\left({x}_{i},{y}_{i}\right)$ to the image size $n\times m$ by:$${x}_{i}^{{\prime} }={{{\mathrm{int}}}}\left(\frac{{{{\rm{x}}}}_{i}-\min \left({{{\boldsymbol{x}}}}\right)}{\max \left({{{\boldsymbol{x}}}}\right)-\min \left({{{\boldsymbol{x}}}}\right)}\times \left(n-1\right)\right),\,{y}_{i}^{{\prime} }={{{\mathrm{int}}}}\left(\frac{{y}_{i}-\min \left({{{\boldsymbol{y}}}}\right)}{\max \left({{{\boldsymbol{y}}}}\right)-\min \left({{{\boldsymbol{y}}}}\right)}\times \left(m-1\right)\right),$$(8)where ${{{\boldsymbol{x}}}}$ and ${{{\boldsymbol{y}}}}$ are the set of all ${x}_{i}$ and ${y}_{i}$, respectively, and ${{\mathrm{int}}}(\cdot )$ denotes rounding to the nearest integer. At the scaled coordinates $\left({x}_{i}^{{\prime} },{y}_{i}^{{\prime} }\right)$, set the corresponding pixel in the presentation matrix ${{{\bf{I}}}}$ to $({z}_{i}^{1},{z}_{i}^{2} , \ldots , {z}_{i}^{d})$. There could be many zero pixels surrounding each cell/spot in ${{{\bf{I}}}}$ resulting in visual discontinuities. Then, we perform interpolation on the representation to improve the resolution for the subsequent refined alignment. We set a neighborhood radius size $r$ (default size is 1, which can be adjusted according to the specific requirements of different ST technologies used in alignment tasks), and the neighborhood ${{{\mathcal{N}}}}$ of a pixel point ${{{\bf{I}}}}(a,b)$ is defined as ${{{\mathcal{N}}}}({{{\bf{I}}}}(a,b))=\{(i,j ),i,j\in N|i\in [a-r,a+r],j\in [b-r,b+r]\}$. We retrieve all zero pixels, and for zero pixels where the number of non-zero pixels in the neighborhood is three or more, we perform interpolation on that pixel. The interpolation is calculated as the weighted average of the non-zero pixels within the neighborhood:$${{{\bf{I}}}}\left(i,j\right)={w}_{{ij}}{\sum}_{\left(k,l\right){{{\mathscr{\in }}}}{{{\mathcal{K}}}}}{{{\bf{I}}}}\left(k,l\right),$$(9)where ${{{\mathcal{K}}}}{{{\mathscr{\subseteq }}}}{{{\mathcal{N}}}}\left({{{\bf{I}}}}\left(i,j\right)\right)$ represents the set of all non-zero pixels within the neighborhood of pixel ${{{\bf{I}}}}\left(i,j\right)$, and ${w}_{{ij}}$ are the weights corresponding to each non-zero pixel. By default, ${w}_{{ij}}=\frac{1}{\left|{{{\mathcal{K}}}}\right|}$, where $\left|{{{\mathcal{K}}}}\right|$ is the number of elements in the set ${{{\mathcal{K}}}}$. The reasonableness of integration can be found in Supplementary Fig. 29.Identification of common domainsCODA employs a similar strategy as LightGlue. First, CODA extracts feature points from the source and target representations ${{{{\bf{I}}}}}_{{{{\rm{S}}}}}$ and ${{{{\bf{I}}}}}_{{{{\rm{T}}}}}$, and establishes initial correspondences between them. Suppose the source and target representations contain $M$ and $N$ candidate points, indexed by ${{{{\mathcal{P}}}}}_{S}=\{1 , \ldots , M\}$ and ${{{{\mathcal{P}}}}}_{T}=\{1 , \ldots , N\}$, respectively. To enhance the representation of each feature point, we employ a transformer-based architecture consisting of $L$ alternating self-attention and cross attention layers. Each layer updates the feature representation ${{{{\bf{z}}}}}_{i}$ as follows:$${{{{\bf{z}}}}}_{i}^{l+\frac{1}{2}}\leftarrow {{{{\bf{z}}}}}_{i}^{l}+{{{\rm{MLP}}}}\left({{{\rm{concat}}}}\left({{{{\bf{z}}}}}_{i}^{l},{{{{\bf{m}}}}}_{i}^{{self}}\right)\right),$$(10)$${{{{\bf{z}}}}}_{i}^{l+1}\leftarrow {{{{\bf{z}}}}}_{i}^{l+\frac{1}{2}}+{{{\rm{MLP}}}}\left({{{\rm{concat}}}}\left({{{{\bf{z}}}}}_{i}^{l+\frac{1}{2}},{{{{\bf{m}}}}}_{i}^{{cross}}\right)\right),$$(11)where ${{{{\bf{m}}}}}_{i}$ is the aggregated message computed from attention.In the self-attention module, each point attends to all other points within the same representation. The key, query and value vectors are computed as:$${{{{\bf{k}}}}}_{i}={{{{\bf{W}}}}}^{K}{{{{\bf{z}}}}}_{i},{{{{\bf{q}}}}}_{i}={{{{\bf{W}}}}}^{Q}{{{{\bf{z}}}}}_{i},\,{{{{\bf{v}}}}}_{i}={{{{\bf{W}}}}}^{V}{{{{\bf{z}}}}}_{i}.$$(12)A rotation-based positional encoding ${{{\bf{R}}}}$ is added to incorporate spatial information. The attention score between points i and j is calculated as:$${a}_{{ij}}={{{{\bf{q}}}}}_{i}^{\top }{{{\bf{R}}}}\left(\left({x}_{i}^{{\prime} },{y}_{i}^{{\prime} }\right)-\left({x}_{j}^{{\prime} },{y}_{j}^{{\prime} }\right)\right){{{{\bf{k}}}}}_{j},$$(13)where ${{{\bf{R}}}}$ is a rotation encoding:$${{{\bf{R}}}}\left({{{\bf{x}}}}\right)=\left(\begin{array}{ccc}\hat{{{{\bf{R}}}}}\left({{{{\bf{b}}}}}_{1}^{\top }{{{\bf{x}}}}\right) & 0 & 0\\ 0 & \ddots & 0\\ 0 & 0 & \hat{{{{\bf{R}}}}}\left({{{{\bf{b}}}}}_{l}^{\top }{{{\bf{x}}}}\right)\end{array}\right),\,\hat{{{{\bf{R}}}}}\left(\theta \right)=\left(\begin{array}{cc}\cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{array}\right),$$(14)where ${{{{\bf{b}}}}}_{1},{{{{\bf{b}}}}}_{2} , \ldots , {{{{\bf{b}}}}}_{l}\in {{\mathbb{R}}}^{2}$ are learnable parameters.The output message is:$${{{\bf{m}}}}_{i}^{{{\rm{self}}}}={\sum }_{j\in {{{\bf{I}}}}_{{{\rm{S}}}}\,}{{{\rm{Softmax}}}}_{k{{\mathscr{\in }}}{{\mathcal{P}}}}{\left({a}_{{ik}}\right)}_{j}{{{\bf{v}}}}_{j},$$(15)where ${{{\mathcal{P}}}}$ denotes the point set of the current representation (either ${{{{\mathcal{P}}}}}_{{{{\rm{S}}}}}{{{\rm{or}}}}{{{{\mathcal{P}}}}}_{{{{\rm{T}}}}}$)In the cross-attention module, the key and value vectors are similarly computed:$${{{{\bf{k}}}}}_{i}={{{{\bf{W}}}}}^{K}{{{{\bf{z}}}}}_{i},\,{{{{\bf{v}}}}}_{i}={{{{\bf{W}}}}}^{V}{{{{\bf{z}}}}}_{i}.$$(16)The attention score between points i and j is calculated as:$${a}_{{ij}}={a}_{{ji}}={\left({k}_{i}^{{{{\rm{S}}}}}\right)}^{{{\top }}}{k}_{j}^{{{{\rm{T}}}}}.$$(17)And the output message is:$${{{\bf{m}}}}_{i}^{{{\rm{cross}}}}={\sum }_{j}{{{\rm{Softmax}}}}_{k\in {{{\mathcal{P}}}}_{{{\rm{opp}}}}}{\left({a}_{{ik}}\right)}_{j}{{{\bf{v}}}}_{j},$$(18)where ${{{{\mathcal{P}}}}}_{{{{\rm{opp}}}}}$ denotes the point set of the opposite representation.The similarity score matrix between source and target points is computed as:$${S}_{{ij}}=\phi {\left({z}_{i}^{{{{\rm{S}}}}}\right)}^{{{\top }}}\phi \left({z}_{j}^{{{{\rm{T}}}}}\right),\,\forall \left(i,j\right)\in {{{{\mathcal{P}}}}}_{{{{\rm{S}}}}}\times {{{{\mathcal{P}}}}}_{{{{\rm{T}}}}},$$(19)where $\phi$ is a learnable linear projection. ${S}_{{ij}}$ is normalized as follows:$${P}_{{ij}}={\sigma }_{i}^{{{{\rm{S}}}}}{\sigma }_{j}^{{{{\rm{T}}}}}{{{{\rm{Softmax}}}}}_{k\in {{{{\mathcal{P}}}}}_{{{{\rm{S}}}}}}{\left({S}_{{kj}}\right)}_{i}{{{{\rm{Softmax}}}}}_{k\in {{{{\mathcal{P}}}}}_{{{{\rm{T}}}}}}{\left({S}_{{ik}}\right)}_{j},$$(20)where ${\sigma }_{i}={{{\rm{Sigmoid}}}}\left(\phi \left({z}_{i}\right)\right)\in \left[0,1\right]$. A pair of $\left(i,j\right)$ is considered a valid correspondence if both points are predicted as matchable, ${S}_{{ij}}$ is greater than a threshold $\delta$, and is larger than all other entries in its row and column.Let ${{{\mathcal{M}}}}{{{\mathscr{\subseteq }}}}{{{{\mathcal{P}}}}}_{{{{\rm{S}}}}}\times {{{{\mathcal{P}}}}}_{{{{\rm{T}}}}}$ denote the set of matched point pairs, and let ${\bar{{{{\mathcal{P}}}}}}_{{{{\rm{S}}}}}\subseteq {{{{\mathcal{P}}}}}_{{{{\rm{S}}}}}$ and ${\bar{{{{\mathcal{P}}}}}}_{{{{\rm{T}}}}}\subseteq {{{{\mathcal{P}}}}}_{{{{\rm{T}}}}}$ denote the sets of unmatched points in the source and target representations, respectively. The Loss function is$${{{\rm{Loss}}}}=-\frac{1}{L}{\sum}_{\ell }\left(\frac{1}{\left|{{{\mathcal{M}}}}\right|}{\sum}_{\left(i,j\right)\in {{{\mathcal{M}}}}}\log {P}_{{ij}}^{(\ell )}+\frac{1}{2\left|{\bar{{{{\mathcal{P}}}}}}_{{{{\rm{S}}}}}\right|}{\sum}_{i\in {\bar{{{{\mathcal{P}}}}}}_{{{{\rm{S}}}}}}\log \left(1-{\sigma }_{i}^{{{{\rm{S}}}},(\ell )}\right)+\frac{1}{2\left|{\bar{{{{\mathcal{P}}}}}}_{{{{\rm{T}}}}}\right|}{\sum}_{j\in {\bar{{{{\mathcal{P}}}}}}_{{{{\rm{T}}}}}}\log \left(1-{\sigma }_{j}^{{{{\rm{T}}}},(\ell )}\right)\right).$$(21)where $\ell$ indexes the layer. After the matched points are identified, CODA supplies two strategies to define the common domain from the set of correspondences (see Supplementary Note 5 for details).Local alignmentAfter the global alignment and imaging of the slices, ensuring that both slices are within the same area, CODA performs a local alignment of the source and target slices based on nonlinear velocity fields using multi-channel LDDMM.For two representations ${{{{\bf{I}}}}}_{0}$ and ${{{{\bf{I}}}}}_{1}$, LDDMM performs alignment by finding a diffeomorphism $\varphi$ that satisfies ${I}_{0}\circ {\varphi }^{-1}={I}_{1}$, where $\varphi={\phi }_{1}$ is the endpoint of the flow associated with a smooth time-dependent vector field. The evolution equation ${\dot{\phi }}_{t}={v}_{t}\left({\phi }_{t}^{v}\right),t\in \left[{{\mathrm{0,1}}}\right]$ is solved for $\varphi$, where ${v}_{t}$ is a time-varying velocity field. For given representations ${I}_{{{{\rm{S}}}}}=\left({{{{\boldsymbol{I}}}}}_{{{{\rm{S}}}}1},{{{{\boldsymbol{I}}}}}_{{{{\rm{S}}}}2} , \ldots , {{{{\boldsymbol{I}}}}}_{{{{\rm{S}}}}d}\right)$ and ${I}_{{{{\rm{T}}}}}=\left({{{{\boldsymbol{I}}}}}_{{{{\rm{T}}}}1},{{{{\boldsymbol{I}}}}}_{{{{\rm{T}}}}2} , \ldots , {{{{\boldsymbol{I}}}}}_{{{{\rm{T}}}}d}\right)$, where ${{{{\boldsymbol{I}}}}}_{{{{\rm{S}}}}c},{{{{\boldsymbol{I}}}}}_{{{{\rm{T}}}}c}:\Omega \subset {{\mathbb{R}}}^{2}{\mathbb{\to }}{\mathbb{R}}$, $c={{\mathrm{1,2}}} , \ldots , d$, the velocity field ${v}_{t}$ is determined by solving the following variational problem:$$\hat{{{\boldsymbol{v}}}}={{{\rm{argmin}}}}_{v_{t}\in {R}^{d}\,}{E}_{t}={{{\rm{argmin}}}}_{v_{t} \in {R}^{d}\,}\left({\int }_{0}^{1}{{{\rm{||}}}{v}_{t}{{\rm{||}}}}_{{{\rm{V}}}}^{2}{dt}+{\sum }_{c=1}^{d}\left\{\frac{1}{{\sigma }_{c}^{2}}{{{\rm{||}}}{I}_{{{\rm{S}}}c}\circ {\varphi }^{-1}-{I}_{{{\rm{T}}}c}{{\rm{||}}}}_{{L}^{2}}^{2}\right\}\right).$$(22)The gradient of the above loss function is:$${\nabla }_{v}{E}_{t}=2{v}_{t}-{\sum}_{c=1}^{d}\left(K\left(\frac{2}{{\sigma }_{c}^{2}}\left|D{\phi }_{t,1}^{v}\right|\nabla {J}_{t}^{{{{\rm{S}}}}c}\left({J}_{t}^{{{{\rm{S}}}}c}-{J}_{t}^{{{{\rm{T}}}}c}\right)\right)\right).$$(23)where ${J}_{t}^{{{{\rm{S}}}}c}={I}_{{{{{\rm{S}}}}}_{c}}{\circ }\;({\phi }_{0} \;{\circ }\;{({\phi }_{t})}^{-1}),\, {J}_{t}^{{{{\rm{T}}}}c}={I}_{{{{{\rm{T}}}}}_{c}}{\circ }\;({\phi }_{1}\;{\circ }\;{({\phi }_{t})}^{-1}),\, D{\phi }_{t,1}^{v}$ is the Jacobian of mapping ${\phi }_{t,1}^{v}$. By using the gradient descent algorithm, ${v}_{t}$ can be obtained. Then, by solving the ODE ${\dot{\phi }}_{t}={v}_{t}\left({\phi }_{t}^{v}\right)$, ${\phi }_{t}$ can be determined, resulting in the final $\varphi={\phi }_{1}$.After aligning ${I}_{{{{\rm{S}}}}}$ and ${I}_{{{{\rm{T}}}}}$ using multi-channel LDDMM, we determine the final positions of cells/spots in the source slice by identifying the ultimate pixel locations of their original pixels after transformation. Simultaneously, we update the positions of cells/spots in the target slice to the pixel locations corresponding to their generated presentations.To mitigate potential overcorrection, we quantify the net alignment gain induced by the local step and accept local alignment only when the gain exceeds a preset threshold ($\tau=0.02$ by default). For each source spot/cell $i$, we compare the label agreement with its nearest-neighbor match in the target slice before vs. after local alignment, and define the alignment rate ${r}_{k}=\frac{1}{N}{\sum }_{i=1}^{N}{\mathbb{I}}[{y}_{{{{\rm{S}}}}}(i)={y}_{{{{\rm{T}}}}}({j}_{k}(i))]$, where ${j}_{k}(i)$ denotes the nearest-neighbor index in the target slice under the coordinates before ($k=0$) or after ($k=1$) local alignment. The local step is applied only when $\Delta r={r}_{1}-{r}_{0}\ge \tau$. The corresponding gain-based QC results for all main-analysis scenarios involving local alignment are summarized in Supplementary Table 10.Identification of spatially differential genes (SDGs) via spatial cross-correlationTo identify spatially differential genes (SDGs), we employed a spatial cross-correlation analysis, which modifies Moran’s $I$ index to assess the correlation between aligned spatial gene expression datasets. The cross-correlation index is defined as:$$I=\frac{n}{{\sum }_{i}{\sum }_{j}{w}_{{ij}}}\times \frac{\mathop{\sum }_{i}\mathop{\sum }_{j}{w}_{{ij}}\left({{{{\bf{x}}}}}_{i}-\bar{x}\right)\left({{{{\bf{y}}}}}_{j}-\bar{y}\right)}{\sqrt{\left(\mathop{\sum }_{i}{\left({{{{\bf{x}}}}}_{i}-\bar{x}\right)}^{2}\right)\left(\mathop{\sum }_{j}{\left({{{{\bf{y}}}}}_{j}-\bar{y}\right)}^{2}\right)}},$$(24)where $n$ is the sample size of slice A, ${{{{\bf{x}}}}}_{i}$ and ${{{{\bf{y}}}}}_{j}$ represent the gene expression levels of gene $g$ in cells $i$ and $j$, respectively. $\bar{x}$ and $\bar{y}$ are the average expression levels of gene $g$ in slices A and B, respectively. ${w}_{{ij}}$ is the weight between cell $i$ in slice A and cell $j$ in slice B, defined as ${w}_{{ij}}=\exp \left(-\alpha {\left|{c}_{i}-{c}_{j}\right|}^{2}\right)$, where ${c}_{i}$ and ${c}_{j}$ represent the aligned positions of cells $i$ and $j$ in their corresponding slices. With this measure, SDGs were identified as genes exhibiting low spatial cross-correlation values, indicating notable differences in spatial expression patterns across slices.BenchmarkEvaluation metricsIn the Benchmark analysis, we employed the Expression Correlation Score (ECS), Spatial Consistency Index (SCI), Rotation Recovery Score (RRS), and spot-to-spot mapping metrics to evaluate the accuracy of alignment. These measures assess different aspects of ST alignment, including expression similarity, cell type/region consistency, spatial orientation fidelity, and pairing crowding of spot correspondences. Additionally, for the DLPFC dataset where layer annotations are available, we report the Layer-wise Alignment Accuracy (LAA), which generalizes SCI by allowing small layer offsets.For each cell/spot ${p}_{i}$ in the source slice, we identify its closest spatial neighbor in the target slice, denoted ${q}_{\pi (i)}$, using Euclidean distance in the aligned coordinate system. Let ${E}_{{p}_{i}}$ and ${E}_{{q}_{\pi (i)}}$ be the corresponding gene-expression vectors, and ${C}_{{p}_{i}}$, ${C}_{{q}_{\pi (i)}}$ their cell-type (or layer/region) annotations.Expression Correlation Score (ECS)The ECS is calculated as the mean Pearson correlation coefficient across all paired points:$${{{\rm{ECS}}}}=\frac{1}{{N}_{S}}\mathop{\sum }_{i=1}^{{N}_{S}}\rho \left({E}_{{p}_{i}},{E}_{{q}_{\pi \left(i\right)}}\right),$$(25)where $\rho$ denotes the Pearson correlation coefficient. A higher ECS indicates better correspondence of gene expression.Spatial Consistency Index (SCI) and Layer-wise Alignment Accuracy (LAA)The SCI is defined as the proportion of matched cell types/regions:$${{{\rm{SCI}}}}=\frac{1}{{N}_{S}}\mathop{\sum}_{i=1}^{{N}_{S}}\delta \left({C}_{{p}_{i}},{C}_{{q}_{\pi \left(i\right)}}\right),\delta \left(a,b\right)=\left\{\begin{array}{c}1,{if}\,a=b\\ 0,{if}\,a\,\ne\, b\end{array}\right..$$(26)A higher SCI indicates a greater alignment of biological identities between slices. For datasets with an inherent laminar order (e.g., DLPFC), we exploit the known ordering of layers. Let$${{{\mathcal{L}}}}=\left({L}_{1}\prec {L}_{2}\prec \ldots \prec {L}_{6}\prec {{{\rm{WM}}}}\right)$$(27)be the totally ordered set of layer/region labels, and let $r{{:}} \ell {{\to }}\{{{\mathrm{0,1}}} , \ldots , m\}$ be the order-preserving rank map (e.g., $r\left({L}_{k}\right)=k-1 \, {for \; k}={{\mathrm{1,2}}} , \ldots , 6$ and $r\left({{{\rm{WM}}}}\right)=6$). The layer distance for a paired spot (${p}_{i},\,{q}_{\pi (i)}$) is:$${\Delta }_{i}=\left|r\left({c}_{{p}_{i}}\right)-r\left({c}_{{q}_{\pi \left(i\right)}}\right)\right|,$$(28)Denote the index set of pairs by ${{{\mathcal{J}}}}=\{ i:{c}_{{p}_{i}},\,{c}_{{q}_{ \pi \left(i\right)}}{{\in }} {{{\mathcal{L}}}} \}$ and the Layer-wise Alignment Accuracy (LAA) with tolerance $k$ is:$${{{\rm{LA}}}}{{{{\rm{A}}}}}_{\le k}=\frac{1}{\left|{{{\mathcal{J}}}}\right|}{\sum}_{i{{{\mathscr{\in }}}}{{{\mathcal{J}}}}}{{{\bf{1}}}}\left({\Delta }_{i}\le k\right),\,k\in \left\{1,2\right\},$$(29)where $1\left(\cdot \right)$ denotes the indicatsor function. The special case k = 0 reduces to SCI.Spot-to-spot mapping metricsTo quantify the uniqueness of pairings, we compute the one-to-one coverage (O2O),$${{{\rm{O}}}}2{{{\rm{O}}}}=\frac{1}{{N}_{{{{\rm{S}}}}}}{\sum}_{i=1}^{{N}_{{{{\rm{S}}}}}}{{{\bf{1}}}}\left(\left|{\pi }^{-1}\left(\pi \left(i\right)\right)\right|=1\right),$$(30)i.e., the fraction of source spots whose assigned target is not shared by any other source. To summarize many-to-one crowding, we report the crowding ratio (CR),$${{{\rm{CR}}}}=\frac{{N}_{{{{\rm{S}}}}}}{\left|{{{\rm{Im}}}}\left(\pi \right)\right|},$$(31)where ${{{\rm{Im}}}}\left(\pi \right)=\{\pi \left(i\right):i={{\mathrm{1,2}}} , \ldots , {N}_{{{{\rm{S}}}}}\}$ is the set of distinct targets used. Values of O2O closer to 1 indicate more unique (nearly bijective) pairings, while ${{{\rm{CR}}}}=1$ corresponds to perfect one-to-one usage of targets and increases as crowding grows.Rotation Recovery Score (RRS)Let ${\theta }_{{{{\rm{pred}}}}}$ and ${\theta }_{{{{\rm{true}}}}}$ denote the predicted and the ground-truth rotation angles, respectively. Define $\Delta=|{{{{\rm{\theta }}}}}_{{{{\rm{pred}}}}}-{{{{\rm{\theta }}}}}_{{{{\rm{true}}}}}|$. The RRS is computed as:$${{{\rm{RRS}}}}=1-\frac{\Delta \theta }{{180}^{\circ }},$$(32)where $\Delta \theta=\min (\Delta,{360}^{\circ }-\Delta )$. A RRS closer to 1 indicates a more accurate recovery of the applied rotation.Synthetic nonrigid perturbationsAffine transformationsEach affine case is defined by a matrix ${{{\bf{A}}}}\in {{\mathbb{R}}}^{2\times 2}$ and translation ${{{\bf{t}}}}\in {{\mathbb{R}}}^{2}$ via$${x}^{{\prime} }={{{\bf{A}}}} \, x+{{{\bf{t}}}}.$$(33)Let$$R\left(\theta \right)=\left[\begin{array}{cc}cos \theta & -sin \theta \\ sin \theta & cos \theta \end{array}\right].$$(34)The specific instances used were:Slight affine:$${{{\bf{A}}}}=\left[\begin{array}{cc}0.9 & -0.15\\ 0 & 1\end{array}\right],\,{{{\bf{t}}}}=\left[\begin{array}{c}-0.1\\ 0\end{array}\right].$$(35)Strong affine:$${{{\bf{A}}}}=0.6R\left({30}^{\circ }\right)\left[\begin{array}{cc}1 & 1\\ 0 & 1\end{array}\right]R\left({30}^{\circ }\right),\,{{{\bf{t}}}}=\left[\begin{array}{c}0\\ 0\end{array}\right].$$(36)Skew:$${{{\bf{A}}}}=\left[\begin{array}{cc}1 & -1\\ 0 & 1\end{array}\right],\,{{{\bf{t}}}}=\left[\begin{array}{c}0\\ 0\end{array}\right].$$(37)Squeeze:$${{{\bf{A}}}}=R\left({28}^{\circ }\right)\left[\begin{array}{cc}1 & 1\\ 0 & 1\end{array}\right]R\left({28}^{\circ }\right),\,{{{\bf{t}}}}=\left[\begin{array}{c}0.02\\ -0.03\end{array}\right].$$(38)Uniform scaling:$${{{\bf{A}}}}=\frac{1}{2}{{{{\bf{I}}}}}_{{{{\bf{2}}}}},{{{\bf{t}}}}=\left[\begin{array}{c}0\\ 0\end{array}\right].$$(39)Elastic transformationA sinusoidal warp was applied along ${x}_{1}$ as a function of ${x}_{2}$:$${x}_{1}^{{\prime} }={x}_{1}+{{{\rm{asin}}}}\left(\omega {x}_{2}+\varphi \right),\,{x}_{2}^{{\prime} }={x}_{2}.$$(40)with amplitude $a=0.10$, angular frequency $\omega=4\pi$, and phase $\varphi=0$.Consensus curve across grid resolutionsAt each grid resolution, genes were ranked by the cross-sample spatial cross-correlation index $I$, and the top (5%) were designated as SCGs (consistent with the perturbation analyses). For the consensus analysis, we repeated this ranking at each of the $k$ grid resolutions and, for each gene $g$, counted the number of grids in which it appeared among the top 5%, denoted ${c}_{g}\in \{0 , \ldots , k\}$. We then plotted the cumulative curve $O(t)=\# \{g: {c}_{g} \ge t\},{t}=1 , \ldots , k.$To provide a random baseline, we assumed independent uniform selection across grids: at each grid, every gene is selected with probability $p=0.05$, independently across grids (for a fixed $g$). Under this null,$${c}_{g}\sim {{{\rm{Binomial}}}}\left(k,p\right),\,{P}_{t}=\Pr \left({c}_{g}\ge t\right)={\sum}_{i=t}^{k}\left(\begin{array}{c}k\\ i\end{array}\right){p}^{i}{\left(1-p\right)}^{k-i}.$$(41)The expected baseline curve and its variance are$${\mathbb{E}}\left[O\left(t\right)\right]={N}_{{{{\rm{total}}}}}{P}_{t},{{{\rm{Var}}}}\left[O\left(t\right)\right]\approx {N}_{{{{\rm{total}}}}}{P}_{t}\left(1-{P}_{t}\right),$$(42)and we display a pointwise (95%) normal-approximation band$${\mathbb{E}}\left[O\left(t\right)\right]\pm 1.96\sqrt{{{{\rm{Var}}}}\left[O\left(t\right)\right]}.$$(43)Observed curves rising above the band over a range of (t) indicate cross-resolution consensus stronger than chance.Perturbation analyses for SCG robustness in the artery datasetTo assess the stability and specificity of CODA’s spatial cross-correlation metric, we first diagnosed potential expression-level bias by binning common genes into quintiles of pooled log-mean expression and comparing the bin-wise SCG/SDG rates against the 5% expectation using the cross-correlation index $I$ on the artery dataset. After this bias check, we designed two controlled gene up-/down-regulation experiments for SCGs. In the baseline analysis, after alignment, genes were ranked by the cross-sample spatial cross-correlation index $I$, and the top/bottom 5% were designated as SCGs/SDGs for subsequent perturbation and evaluation (recomputing $I$ and comparing rank changes).Consider two aligned samples $s\in \{{{{\rm{CAD}}}},{{{\rm{NOR}}}}\}$ with count matrices ${X}^{(s)}\in {{\mathbb{N}}}^{{n}_{{{{\rm{s}}}}}\times G}$. Let $\Gamma \subset \{1 , \ldots , G\}$ be the SCG set from the baseline analysis; only columns indexed by $\Gamma$ are perturbed, while all other columns and all coordinates remain unchanged.Expression-level biasIn addition to the perturbation analyses, we assessed whether SCG/SDG calls are confounded by absolute expression on the artery dataset. For each gene ${g}$, we computed the log-mean expression ${\mu }_{s}\left(g\right)={n}_{s}^{-1}{\sum }_{i=1}^{{n}_{s}}\log \left(1+{X}_{{ig}}^{\left(s\right)}\right)$ and detection rate ${\rho }_{s}\left(g\right)={n}_{s}^{-1}{\sum }_{i=1}^{{n}_{s}}{{{\mathbf{1}}}}\{{X}_{{ig}}^{\left(s\right)} > 0\}$, retaining genes with $\min \left\{{\rho }_{{{{\rm{CAD}}}}}\left(g\right),{\rho }_{{{{\rm{NOR}}}}}\left(g\right)\right\}\ge \tau \,(\tau=0.10)$. To remove between-sample scale effects, we ranked ${\mu }_{{{{\rm{CAD}}}}}(g)$ and ${\mu }_{{{{\rm{NOR}}}}}(g)$ within each sample and averaged the ranks to obtain a pooled rank $R\left(g\right)=\left({rank}\left({\mu }_{{{{\rm{CAD}}}}}\left(g\right)\right)+{rank}\left({\mu }_{{{{\rm{NOR}}}}}\left(g\right)\right)\right)/2\,$; genes were then partitioned into five equal-frequency expression bins ${Q}_{1} , \ldots , {Q}_{5}\,$(low→high) by $R(g)$. For each bin ${Q}_{b}$ we estimated ${\hat{p}}_{b}=\left|{{{\mathcal{S}}}}{{{\mathscr{\cap }}}}{Q}_{b}\right|/\left|{Q}_{b}\right|$ for ${{{\mathcal{S}}}}{{\in }}\{{{{{\mathcal{S}}}}}_{{{{\rm{SCG}}}}},{{{{\mathcal{S}}}}}_{{{{\rm{SDG}}}}}\}$ and compared it to the independence baseline ${p}_{0}=0.05$; 95% binomial Wilson intervals were reported.Upregulation (structure-agnostic amplification)Let $\alpha > 1$ be the amplification factor (here we use $\alpha=8$). For each sample $s$ and each $g\in \Gamma$, denote the column mean$${\mu }_{g}^{\left(s\right)}=\frac{1}{{n}_{{{{\rm{s}}}}}}{\sum}_{i=1}^{{n}_{{{{\rm{s}}}}}}{X}_{{ig}}^{\left(s\right)}.$$(44)Replace the entire column by i.i.d. Poisson draws independent of location:$${X}_{{ig}}^{{\prime} \left(s\right)}{ \sim }^{{\mbox{i.i.d.}}}{{\rm{Poisson}}}\left(\alpha {\mu }_{g}^{\left(s\right)}\right),i=1,\ldots,{n}_{{{\rm{s}}}}.$$(45)For $g \, \notin \, \Gamma$, set ${X}_{{ig}}^{{\prime} \left(s\right)}={X}_{{ig}}^{(s)}$.Downregulation (structure-preserving thinning). Let $\tau \in ({{\mathrm{0,1}}})$ be a fixed thinning rate. For each gene $g\in \Gamma$, apply independent binomial thinning entry-wise:$${X}_{{ig}}^{{\prime} \left(s\right)}\sim {{{\rm{Binom}}}}\left({X}_{{ig}}^{\left(s\right)},\tau \right),i=1 , \ldots , {n}_{{{{\rm{s}}}}}.$$(46)For $g \, \notin \, \Gamma$, set ${X}_{{ig}}^{{\prime} \left(s\right)}={X}_{{ig}}^{(s)}$.Pseudo-slice negative controlWe constructed pseudo-slices from a single real slice by overlapping subsampling. Given one slice with $N$ spots/cells, we independently sampled two subsets $A$ and $B$ without replacement, each containing ${{{\rm{\lfloor }}}}{pN}{{{\rm{\rfloor }}}}$ spots/cells ($p\in (0,1]$), allowing partial overlap between $A$ and $B$.Enrichment analysisFunctional enrichment analysisWe performed Gene Ontology (GO) and KEGG enrichment with clusterProfiler using organism-specific annotations (org.Hs.eg.db for the arterial data; org.Mm.eg.db for the mouse-brain data; KEGG organism codes hsa and mmu, respectively). Input gene symbols were mapped to ENTREZ identifiers; unmapped entries and duplicates were removed. The background (universe) for each analysis comprised all genes successfully mapped within that dataset.For over-representation analysis (ORA), we derived two score-based gene sets from each ranked table: the top 5% as SCG and the bottom 5% as SDG. GO-ORA was run separately for BP, MF, and CC; KEGG-ORA was performed with Benjamini–Hochberg correction (adjusted ${P}\le \,0.05$, ${q}\le \,0.10$).For gene set enrichment analysis (GSEA), the full ranked list (descending by the dataset’s score) was used for GO (BP/MF/CC) and KEGG with minGSSize = 10, maxGSSize = 500, numerical tolerance eps = 1 × 10−10, BH adjustment, and significance threshold adjusted ${P}\le \,0.05$. Results were summarized as dot plots (top 20 terms per collection) and running-score curves for representative pathways.Marker enrichment analysisWe performed gene-set enrichment with Enrichr against two curated cell-marker libraries—PanglaoDB_Augmented_2021 and CellMarker_Augmented_2021—for both the arterial dataset and the mouse-brain dataset. For each analysis, Enrichr’s adjusted $P$-values were retrieved per term and harmonized to canonical cell classes when unambiguous. Evidence across libraries was aggregated by the minimum adjusted $P$-value for each term $t$,$${{{\rm{FD}}}}{{{{\rm{R}}}}}_{\min }\left(t\right)={\min }_{d}{{{\rm{FD}}}}{{{{\rm{R}}}}}_{t,d},$$(47)and summarized as $S\left(t\right)=-{\log }_{10}\left({{{\rm{FD}}}}{{{{\rm{R}}}}}_{\min }\left(t\right)\right)$. We also recorded the number of libraries supporting each term (“hits”). Terms were ranked by $S(t)$ (ties broken by hits), and the top 15 were visualised as horizontal bar plots with a reference line at ${{{\rm{FDR}}}}=0.05$.1. Arterial dataset. The SCG top-300 gene set was queried against both libraries. Top terms had hits = 1 throughout, indicating that leading annotations were contributed by a single resource; we therefore interpret arterial results primarily via ${{{{\rm{FDR}}}}}_{\min }$ rather than cross-library concordance.2. Mouse brain dataset. The SCG (mouse) gene set was assessed with the same procedure, but results were restricted to CNS-relevant annotations using a predefined inclusion mask (brain/cortex/hippocampus, neuron/glia, astrocyte, oligodendrocyte, microglia, pyramidal, GABA, glutamatergic), and labels were harmonized to canonical CNS classes (e.g., glutamatergic neurons, GABAergic neurons, pyramidal neurons, interneurons, oligodendrocytes, astrocytes, microglia, glial cells, neurons). Ranking and visualization followed the scheme above.All enrichment analyses were performed in R 4.3.0 with Bioconductor 3.17, using clusterProfiler 4.8.1, enrichplot 1.20.0, DOSE 3.26.1, org.Hs.eg.db 3.17.0, org.Mm.eg.db 3.17.0, ggplot2 3.4.2, and tidyverse 2.0.0.Compared methodsWe benchmarked against PASTE (v1.4.0; alpha=0.1), SANTO (epochs=20, lr=0.01, k = 5, alpha=0.1), STalign (v1.0; niter=1000, diffeo_start=100, a = 250, epV=1000, sigmaB=0.1, μ_B = (0,0,0)), ST-GEARS (v1.0.0; alpha=0.8, numItermax=150), STAligner (v1.0.0; rad_cutoff=15, knn_neigh=100, n_epochs=600), and SPACEL (v1.1.8; n_neighbors=15, n_threads=10, p = 2). Unless otherwise noted, all remaining hyperparameters followed the authors’ defaults. Parameter sweep experiments can be found in Supplementary Figs. 30, 31, and 32.Implementation environmentAll experiments were conducted on a Win11 system equipped with Intel i5−13490F and Crucial 16GB*2 DDR4-3200MHz. GPU acceleration was enabled using an NVIDIA GeForce RTX 4070 SUPER with 12 GB memory, running NVIDIA driver version 560.94. CODA was implemented in Python 3.9. The analyses were performed using numpy (1.26.3), pandas (2.2.2), scipy (1.13.1), matplotlib (3.9.4), Pillow (10.2.0), scikit-image (0.24.0), opencv-python (4.10.0.84), scanpy (1.10.2), anndata (0.10.8), bbknn (1.6.0), kornia (0.7.3), shapely (2.0.6), and louvain (0.8.2). CODA was developed and tested with PyTorch 2.4.0 and CUDA 11.8 for GPU acceleration, while CPU-only execution is also supported. The package can be installed via pip (icoda = 2.0.0).Statistics & reproducibilityThis study is based on computational analyses of existing publicly available spatial transcriptomics datasets. No statistical method was used to predetermine sample size. Sample sizes were determined by the original datasets and by the benchmark design described in this study. No biological samples or observations were excluded from CODA analyses. For computational benchmarking, when certain baseline methods exceeded memory limits on the full MERFISH dataset, a predefined protocol of uniform subsampling to 10,000 cells per slice was used for those methods, as described in the Results and Methods. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment, because this study did not involve investigator-controlled experimental allocation or outcome assessment.Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.