The Atlas-Learn Approach to the Manifold Hypothesis

Wait 5 sec.

[This article was first published on R Works, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. The manifold hypothesis, the idea that real-world high-dimensional data concentrates near a low-dimensional curved subspace, is foundational to modern machine learning. Many popular manifold learning methods such as UMAP, t-SNE, Isomap, and diffusion maps do achieve dimensionality reduction by embedding data into a flat Euclidean space , but they do not attempt to directly learn the underlying manifold. In contrast, the 2025 paper by Robinett et al., Atlas-based manifold representations for interpretable Riemannian machine learning, offers a proof of concept for directly tackling the manifold hypothesis based on fundamental ideas from differential geometry. It provides an algorithm for learning a low dimensional manifold from point data by constructing an atlas of charts. The paper is also notable for the design of an efficient data structure for working with the learned atlas and for the extensive supplementary materials that include a GitHub Repository containing several practical Python algorithms for doing calculations on manifolds, and an extraordinary amount of implementation detail.Reading through Robinett et al., however, requires a fairly deep background in the theory of differential geometry. This post is an attempt to provide an on-ramp to Robinett et al. by discussing the relatively simple example of the two dimensional sphere, embedded in . It implements the Atlas-Learn data structures and algorithms in R, uses them to learn and then goes on to validate the Atlas-Learn algorithm for the sphere via three independent methods: 1) use numerical integration along the manifold to trace a great circle on the sphere, 3) recover the radius of curvature of the sphere from the atlas, and 4) verify the Gauss-Bonnet Theorem for the sphere.The R code was mostly worked out by Claude Sonnet 4.3 in the context of participating in the Posit beta test for its AI Assistant. I found the integration of the AI engine into the RStudio IDE an effective means of communicating with Claude and managing the project workflow.Atlas-Learn: Theory and AlgorithmThis section provides some minimal theoretical background for understanding the Atlas-Learn algorithm. A smooth manifold of intrinsic dimension embedded in can be described by an atlas — a finite collection of charts such that the open sets cover and each chart map is a smooth bijection onto its image.Normally, the definition of a smooth manifold also requires that any two charts be smoothly compatible, where two charts and are said to be iff and are both open in and the transition map is a diffeomorphism (e.g. see [2]). Robinett et al. relax the smoothly compatible requirement and define transition maps separately from coordinate chart images. They then approximate a differentiable atlas by ensuring that the discrepancy between coordinate charts and transition maps goes to 0 as the number of charts and the number of points sampled goes to infinity.In the Atlas-Learn algorithm the manifold is a surface () embedded in , and both the covering sets and the chart maps are learned from a finite point cloud . The algorithm proceeds to construct an atlas in four basic steps.The Atlas-Learn algorithm proceeds in four steps for each chart:The point cloud comprising the data, the sphere in our case, is partitioned into k-medoids.Local PCA is used to find the tangent plane and the normal plane for each point.Quadratic regression is performed to find the curvature coefficients, KThe minimum ellipsoidal region enclosing the chart is estimated.Step 1: Partitioning via k-medoidsThe point cloud is partitioned into clusters using the -medoids algorithm (PAM). Unlike -means, PAM selects actual data points as cluster centers (medoids), which makes the partition robust to outliers and avoids projection artefacts. Each point receives a chart label , and the points belonging to chart together with their centroid areStep 2: Local PCA and tangent-plane estimationFor each cluster , the centered data matrix is decomposed via the thin SVD:The first two right singular vectors span the local tangent plane:while the third singular vector estimates the local surface normal (the direction of least variance). Each centered point is then decomposed into tangent and normal components:Step 3: Quadratic chart mapOn a smooth surface the normal offset is a smooth function of the tangent coordinates . Atlas-Learn approximates this by a degree-2 polynomial (capturing local curvature):where is estimated by ordinary least squares with a small ridge penalty :The resulting inverse chart map reconstructs an ambient point from local coordinates :Its Jacobian , required for geodesic integration, is:Step 4: Ellipsoidal chart domainsEach chart is assigned an ellipsoidal domain defined bywhere is a rescaled inverse covariance of the projected points:Setting (default ) inflates each domain slightly beyond the convex hull of its own cluster, so that neighboring charts overlap and transitions are always possible. On specifically, because the sphere is isotropic and the -medoids partition tends to produce roughly equal-area, near-circular patches, the learned ellipsoids are close to circles ( for some scalar ). Each chart is assigned a domain where is a rescaled inverse covariance of the projected tangent-plane coordinates. Setting the scale factor (default ) inflates domains slightly so that neighboring charts overlap and transitions are always possible.The Atlas-Learn ImplementationHere are the required R packages.Required Packageslibrary(tidyverse)library(cluster) # pam() for k-medoids partitioninglibrary(RANN) # nn2() for fast k-nearest-neighbor querieslibrary(plotly) # interactive 3D visualizationlibrary(purrr) # map() / imap() for list operationsThis block of code contains all of the functions for the Atlas Learn implementation.Show the code#| label: atlas-functions# ===========================================================================# PART 1: Quadratic feature helpers# ---------------------------------------------------------------------------# These functions implement the d=2 specialization of the general quadratic# feature map. For general d, phi(xi) would have choose(d+1, 2) components.# For d=2 it has exactly 3: [xi1^2, xi1*xi2, xi2^2].# ===========================================================================# Maps xi in R^2 to the three quadratic monomials used to model surface curvature.# General d would give choose(d+1,2) monomials; for d=2 this is exactly 3.quad_features