Published on August 22, 2025 9:46 PM GMTOur posts on natural latents have involved two distinct definitions, which we call "stochastic" and "deterministic" natural latents. We conjectured that, whenever there exists a stochastic natural latent (to within some approximation), there also exists a deterministic natural latent (to within a comparable approximation). Four months ago, we put up a bounty to prove this conjecture.We've been bottlenecked pretty hard on this problem, and spent most of the last four months attacking it. At long last, we have a proof. As hoped, the proof comes with some qualitative new insights about natural latents, and we expect it will unbottleneck a bunch of future work. The main purpose of this post is to present the proof.This post officially closes the corresponding bounty.Recap: What Was The Problem Again?(This section is mostly copied from the bounty post.)Some Intuition From The Exact CaseIn the exact case, in order for a natural latent to exist over random variables X1,X2.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0}.MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0}.mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table}.mjx-full-width {text-align: center; display: table-cell!important; width: 10000em}.mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0}.mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left}.mjx-numerator {display: block; text-align: center}.mjx-denominator {display: block; text-align: center}.MJXc-stacked {height: 0; position: relative}.MJXc-stacked > * {position: absolute}.MJXc-bevelled > * {display: inline-block}.mjx-stack {display: inline-block}.mjx-op {display: block}.mjx-under {display: table-cell}.mjx-over {display: block}.mjx-over > * {padding-left: 0px!important; padding-right: 0px!important}.mjx-under > * {padding-left: 0px!important; padding-right: 0px!important}.mjx-stack > .mjx-sup {display: block}.mjx-stack > .mjx-sub {display: block}.mjx-prestack > .mjx-presup {display: block}.mjx-prestack > .mjx-presub {display: block}.mjx-delim-h > .mjx-char {display: inline-block}.mjx-surd {vertical-align: top}.mjx-surd + .mjx-box {display: inline-flex}.mjx-mphantom * {visibility: hidden}.mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%}.mjx-annotation-xml {line-height: normal}.mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible}.mjx-mtr {display: table-row}.mjx-mlabeledtr {display: table-row}.mjx-mtd {display: table-cell; text-align: center}.mjx-label {display: table-row}.mjx-box {display: inline-block}.mjx-block {display: block}.mjx-span {display: inline}.mjx-char {display: block; white-space: pre}.mjx-itable {display: inline-table; width: auto}.mjx-row {display: table-row}.mjx-cell {display: table-cell}.mjx-table {display: table; width: 100%}.mjx-line {display: block; height: 0}.mjx-strut {width: 0; padding-top: 1em}.mjx-vsize {width: 0}.MJXc-space1 {margin-left: .167em}.MJXc-space2 {margin-left: .222em}.MJXc-space3 {margin-left: .278em}.mjx-test.mjx-test-display {display: table!important}.mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px}.mjx-test.mjx-test-default {display: block!important; clear: both}.mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex}.mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left}.mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right}.mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0}.MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal}.MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal}.MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold}.MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold}.MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw}.MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw}.MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw}.MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw}.MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw}.MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw}.MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw}.MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw}.MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw}.MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw}.MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw}.MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw}.MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw}.MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw}.MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw}.MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw}.MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw}.MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw}.MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw}.MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw}.MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw}@font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')}@font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')}@font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')}@font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold}@font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')}@font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')}@font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')}@font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')}@font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold}@font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')}@font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')}@font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic}@font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')}@font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')}@font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')}@font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')}@font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold}@font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')}@font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')}@font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic}@font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')}@font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')}@font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')}@font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')}@font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')}@font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')}@font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')}@font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')}@font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold}@font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')}@font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')}@font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic}@font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')}@font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')}@font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')}@font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')}@font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic}@font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')}@font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')}@font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')}@font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')}@font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')}@font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')}@font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')}@font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')}@font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')}@font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')}@font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')}@font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')}@font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold}@font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}, the distribution has to look roughly like this:Each value of X1 and each value of X2 occurs in only one "block", and within the "blocks", X1 and X2 are independent. In that case, we can take the (exact) natural latent to be a block label.Notably, that block label is a deterministic function of X.However, we can also construct other natural latents for this system: we simply append some independent random noise to the block label. That natural latent is not a deterministic function of X; it's a "stochastic" natural latent.In the exact case, if a stochastic natural latent exists, then the distribution must have the form pictured above, and therefore the block label is a deterministic natural latent. In other words: in the exact case, if a stochastic natural latent exists, then a deterministic natural latent also exists.Our goal here is to prove that this still holds in the approximate case, using the same information theoretic approximation methods used in our other posts on natural latents (and explained here).The Problem"Stochastic" Natural LatentsStochastic natural latents were introduced in the original Natural Latents post. Any latent Λ over random variables X1,X2 is defined to be a stochastic natural latent when it satisfies these diagrams:... and Λ is an approximate stochastic natural latent (with error ϵ) when it satisfies the approximate versions of those diagrams to within ϵ, i.e.ϵ≥DKL(P[X,Λ]||P[Λ]P[X1|Λ]P[X2|Λ])ϵ≥DKL(P[X,Λ]||P[X2]P[X1|X2]P[Λ|X1])ϵ≥DKL(P[X,Λ]||P[X1]P[X2|X1]P[Λ|X2])Key thing to note: if Λ satisfies these conditions, then we can create another stochastic natural latent Λ′ by simply appending some random noise to Λ, independent of X. This shows that Λ can, in general, contain arbitrary amounts of irrelevant noise while still satisfying the stochastic natural latent conditions."Deterministic" Natural LatentsDeterministic natural latents were introduced in a post by the same name. Any latent Λ over random variables X1,X2 is defined to be a deterministic natural latent when it satisfies these diagrams:... and Λ is an approximate deterministic natural latent (with error ϵ) when it satisfies the approximate versions of those diagrams to within ϵ, i.e.ϵ≥DKL(P[X,Λ]||P[Λ]P[X1|Λ]P[X2|Λ])ϵ≥H(Λ|X1)ϵ≥H(Λ|X2)See the linked post for explanation of a variable appearing multiple times in a diagram, and how the approximation conditions for those diagrams simplify to entropy bounds.Note that the deterministic natural latent conditions, either with or without approximation, imply the stochastic natural latent conditions; a deterministic natural latent is also a stochastic natural latent.The GoalWe'd like a proof that, if a stochastic natural latent exists over two variables X1,X2 to within approximation ϵ, then a deterministic natural latent exists over those two variables to within approximation roughly ϵ - specifically, 9ϵ.The ProofKey IdeasThere are two key ideas to the proof.The first key idea is to use resampling to obtain a latent which satisfies one of the natural latent conditions exactly, and the others approximately.The second key idea is to consider pareto optimal stochastic natural latents - i.e. latents with pareto minimal error on the three natural latent conditions.It turns out that stochastic natural latents which exactly satisfy one of the natural latent conditions and are pareto optimal work like the exact case, even when no exact natural latent exists.Specifically: pareto optimal stochastic natural latents Λ′ over X1,X2 which satisfy one redundancy condition exactly can always be coarse grained into a deterministic natural latent - i.e. fΛ(Λ′)=fX(X) with probability 1 for some fΛ and fX, andΛ′,X are independent conditional on fΛ(Λ′).So, fΛ(Λ′) is itself a natural latent with the same errors as Λ′, and it's exactly a deterministic function of X.This was a big and very welcome surprise to us!MathAssumptions & PreconditionsWe will assume P[X,Λ]>0 for the original stochastic natural latent Λ. This implies that there are no nontrivial exact natural latents.[1] We make this assumption mainly for mathematical convenience; because we're interested in practical approximation it is a reasonable assumption to use. But we do expect the assumption can be dropped, at the cost of more details to handle in the proof.The main preconditions for our proof are that three random variables X1, X2, Λ approximately satisfy the three natural latent conditions, i.e.Left to right: The mediation condition, the first redundancy condition, the second redundancy condition.or, written out (and simplified a little),First redundancy condition: ϵ≥DKL(P[X,Λ]||P[X]P[Λ|X1])Second redundancy condition: ϵ≥DKL(P[X,Λ]||P[X]P[Λ|X2])Mediation condition: ϵ≥DKL(P[X,Λ]||P[Λ]P[X1|Λ]P[X2|Λ])Resampling Conserves NaturalityA previous post showed that resampling conserves redundancy. Specifically, we can construct a new latent Λ′ by sampling from P[Λ|X2]; the new joint distribution is thenP[Λ′,X]=P[X]P[Λ=λ′|X2]Given the two redundancy conditions and P[X,Λ]>0, the new latent Λ′ satisfies the second redundancy condition perfectly (by construction), and satisfies the first redundancy condition to within 9ϵ. (Leveraging all three natural latent conditions, empirical tests also strongly suggest that that bound can be improved from 9ϵ to 3ϵ, but that is not yet proven.)Now, imagine that instead of constructing Λ′ by sampling from P[Λ|X2], we instead construct X′1 by sampling from P[X1|X2]. Key insight: this results in exactly the same joint distribution as sampling Λ′ from P[Λ|X2], i.e.P[X′1=x1,X2,Λ]=P[X1,X2,Λ′=λ]=P[X2]P[X1|X2]P[Λ|X2]By the same "resampling conserves redundancy" theorem, the X′1 construction satisfies the mediation condition to within 9ϵ (rather than the first redundancy condition). But since the X′1 construction and the Λ′ construction yield the same joint distribution, with one of them implying the first redundancy condition is satisfied to within 9ϵ and the other implying the mediation condition is satisfied to within 9ϵ, that same joint distribution must satisfy both conditions to within 9ϵ.Putting that all together: starting from a latent Λ over X1,X2 which satisfies all three natural latent conditions to within ϵ, we can construct a new latent Λ′ which satisfies the second redundancy condition perfectly, and satisfies the other two conditions to within 9ϵ.We'll use that new latent as our starting point for the second half of the proof, in which we look for pareto improvements upon Λ′.Pareto Minimization -> Single Objective MinimizationIn the second half of the proof, we'll consider taking pareto improvements upon Λ′, i.e. looking for a latent Λ∗ with pareto-optimal error on the three natural latent conditions. Since we're pareto improving on Λ′,Λ∗ will have zero error on the second redundancy condition (i.e. P[X,Λ∗]=P[X]P[Λ∗|X2]), and at most 9ϵ error on the other two conditions.First, we convert our pareto minimization problem into a single objective minimization problem in the standard way, in order to use the standard optimization toolset.A latent Λ∗ is defined by P[Λ∗|X]. We want to characterize latents for which the errors on the three natural latent conditions are pareto minimal, holding P[X1,X2] constant. The three errors are:DKL(X1←Λ∗→X2):=DKL(P[X,Λ∗]||P[X1|Λ∗]P[Λ∗]P[X2|Λ∗])DKL(Λ∗→X1→X2):=DKL(P[X,Λ∗]||P[Λ∗]P[X1|Λ∗]P[X2|X1])DKL(Λ∗→X2→X1):=DKL(P[X,Λ∗]||P[Λ∗]P[X2|Λ∗]P[X1|X2])(Note that we've written these slightly differently from the previous section. They are equivalent, and these expressions will save some minor rearrangement in the proof.)To use the usual optimization toolset, we convert the pareto minimization problem into a single objective minimization problem by assigning weights α to each error. Our single objective isobj:=αmedDKL(X1←Λ∗→X2)+α1DKL(Λ∗→X1→X2)+α2DKL(Λ∗→X2→X1)Any pareto minimum for the original problem must be a minimum of obj for some (αmed,α1,α2) with αi≥0 for all indices (including med, which is an ordinary index). Without loss of generality, we assume ∑iαi=1. In general, different α's in the single objective problem pick out different pareto minima in the original problem.Lagrangian & First Order ConditionsNow we turn the crank.We (implicitly so far) have two constraints on our optimization problem:∀X:∑∗ΛP[Λ∗|X]=1∀Λ∗,X:P[Λ∗|X]≥0We introduce Lagrange multipliers ν[X], μ[Λ∗,X], respectively, for these two constraints. That gives the LagrangianL:=obj−∑Xν[X](∑∗ΛP[Λ∗|X]−1)−∑Λ∗,Xμ[Λ∗,X]P[Λ∗|X]Differentiating L with respect to P[Λ∗|X] (at constant P[X]), simplifying, and absorbing some terms into the Lagrange multipliers yields our first order condition:0=lnP[X|Λ∗]−(α1+αmed)lnP[X1|Λ∗]−(α2+αmed)lnP[X2|Λ∗]−ν′[X]−μ[X,Λ∗]where the Lagrange multiplier ν′ has absorbed some terms which depend only on X. Note that, while the term μ[X,Λ∗] looks completely arbitrary, it is constrained by complementary slackness: μ[X,Λ∗] can be nonzero only if P[Λ∗|X] is zero, i.e. μ[X,Λ∗]P[Λ∗|X]=0 for all X,Λ∗.Putting The Pieces Together & Solving The EquationsEarlier, we established that a latent exists which satisfies the second redundancy condition perfectly and has error at most 9ϵ on the other two conditions. So, let's use our first order condition to characterize specifically those pareto optimal latents which perfectly satisfy the second redundancy condition.Perfect satisfaction of the second redundancy condition means P[X|Λ∗]=P[X1|X2]P[X2|Λ∗]. Substituting that into the first order condition and simplifying gives0=−(α1+αmed)lnP[X1|Λ∗]+α1lnP[X2|Λ∗]−ν′′[X]−μ[X,Λ∗]Now, pick values x,λ0,λ1 such that P[X|Λ∗=λ0]>0 and P[X=x|Λ∗=λ1]>0. Then μ[X=x,Λ∗=λ0]=μ[X=x,Λ∗=λ1]=0 by complementary slackness, and we can subtract the first order conditions at (x,λ1) and (x,λ0) to get0=−(α1+αmed)(lnP[X1=x1|Λ∗=λ1]−lnP[X1=x1|Λ∗=λ0])+α1(lnP[X2=x2|Λ∗=λ1]−lnP[X2=x2|Λ∗=λ0])Note that one of those terms depends on X1 (but not X2), and the other depends on X2 (but not X1), so the only way they can add to 0 for all X values for which P[X|Λ∗=λ0]>0 and P[X|Λ∗=λ1]>0 is if both are equal to some C(λ0,λ1) which does not depend on X:C(λ0,λ1)=(α1+αmed)(lnP[X1|Λ∗=λ1]−lnP[X1|Λ∗=λ0])C(λ0,λ1)=α1(lnP[X2|Λ∗=λ1]−lnP[X2|Λ∗=λ0])Both of those equations must hold for all X such that P[X|Λ∗=λ0]>0 and P[X|Λ∗=λ1]>0.Notably, our assumption P[X,Λ]>0[2] implies P[X]>0 implies P[X1|X2]>0. Combined with P[X|Λ∗]=P[X1|X2]P[X2|Λ∗] (the perfect second redundancy condition for Λ∗), that means P[X1|Λ∗]>0 for all X1,Λ∗ - or, put differently, for all X1, Λ∗, there exists some X2 such that P[X1,X2,Λ∗]>0. So,C(λ0,λ1)=(α1+αmed)(lnP[X1|Λ∗=λ1]−lnP[X1|Λ∗=λ0])must hold for all X1, whenever the support of P[X2|Λ∗=λ0] and P[X2|Λ∗=λ1] overlap at all. Shuffling terms around, we getP[X1|Λ∗=λ1]=P[X1|Λ∗=λ0]eC(λ0,λ1)/(α1+αmed)Sum on X1 on both sides, and we get 1=eC(λ0,λ1)/(α1+αmed), implying C(λ0,λ1)=0 and therefore P[X1|Λ∗=λ1]=P[X1|Λ∗=λ0].In short: given two Λ∗ values λ0,λ1, if there exists any X2 such that P[X2|Λ∗=λ0]>0 and P[X2|Λ∗=λ1]>0 (i.e. the support of P[X2|Λ∗] overlaps for the two Λ∗ values), then the two values yield exactly the same distribution P[X1|Λ∗].Furthermore, since C(λ0,λ1)=0 whenever P[X2|Λ∗=λ0] and P[X2|Λ∗=λ1] have overlapping support, we also haveP[X2|Λ∗=λ0]=P[X2|Λ∗=λ1]anywhere that both of those quantities are nonzero.A (Non-Strict) Pareto Improvement Via Coarse GrainingA quick recap of where that last section leaves us. We've established that:For any two Λ∗ values λ0,λ1 for which the support of P[X2|Λ∗] overlaps, we have P[X1|Λ∗=λ0]=P[X1|Λ∗=λ1] for all X1.Furthermore, P[X2|Λ∗=λ0]=P[X2|Λ∗=λ1] wherever the two distributions overlap (i.e. wherever both quantities are nonzero).Now, assume the supports of P[X2|Λ∗=λ0] and P[X2|Λ∗=λ1] overlap somewhere, and consider coarse graining those two values of Λ∗. Compared to Λ∗ itself, how does the coarse grained variable g(Λ∗) score on each of the natural latent conditions?Since P[X1|Λ∗] is exactly the same for both Λ∗ values, the error on X2←Λ∗→X1 will be the same; g(Λ∗) mediates between X1 and X2 exactly as well as Λ∗ does.Since X→Λ∗→g(Λ∗), the coarse grained variable cannot do any worse on the redundancy conditions X1→X2→Λ∗ and X2→X1→Λ∗.So, without making the errors on any of the three natural latent conditions any worse, we can coarse grain all Λ∗ values λ0,λ1 for which the supports of P[X2|Λ∗=λ0] and P[X2|Λ∗=λ1] overlap somewhere.Once all such coarse graining is performed, we have a new coarse grained latent g(Λ∗) for which the support of P[X2|g(Λ∗)] is nonoverlapping for all pairs of Λ∗ values.In other words: g(Λ∗) is exactly a deterministic function of X2 (and therefore still perfectly satisfies the second redundancy condition), and satisfies the first redundancy condition and mediation condition each to within 9ϵ.Finally, A Deterministic Natural LatentLastly, note that H(g(Λ∗)|X2)=0 and X2→X1→g(Λ∗) to within9ϵ implies 9ϵ≥H(g(Λ∗)|X1). Combined with the mediation condition, that implies g(Λ∗) is approximately a deterministic natural latent, with errors at most:9ϵ on the first deterministic redundancy condition0 on the second deterministic redundancy condition9ϵ on the mediation condition.Can we do better?The main room for improvement of the bounds in this proof is in the resampling step. The resampling conserves redundancy post notes where those bounds could be improved, and presents a little empirical evidence that they can be improved to 3ϵ (using all three natural latent conditions) or 4ϵ (using only redundancy).What's Next?We've been bottlenecked pretty hard on this theorem for the past 3-4 months.Now that we finally have it, we expect to largely abandon stochastic natural latents in favor of deterministic natural latents. For instance, one immediate next step will be to rewrite our Illiad paper from last year to work with deterministic natural latents, which will eliminate the weakest parts of that paper and give a much more compelling case. (No, we're not linking to the old paper, because the new one is going to be a lot better.)On another front: stochastic natural latents are relatively easy to test for in datasets, by looking for three variables each of which mediates between the other two. Now we have some idea of what to do with those triples when we find them: compute the deterministic constraint between them.Beyond those two immediate projects, we expect this result to be foundational for basically all of our work on natural latents going forward.^This is because P[X, Λ] > 0 implies P[X] > 0, and nontrivial exact natural latents rely necessarily on some values of X to have probability 0.^Note that that's Λ, i.e. the original latent, not Λ∗.Discuss