Distributional Autoencoders Know the Score

The encoding for the Müller-Brown potential

Abstract

The Distributional Principal Autoencoder (DPA) combines distributionally correct reconstruction with principal-component-like interpretability of the encodings. In this work, we provide exact theoretical guarantees on both fronts. First, we derive a closed-form relation linking each optimal level-set geometry to the data-distribution score. This result explains DPA’s empirical ability to disentangle factors of variation of the data, as well as allows the score to be recovered directly from samples. When the data follows the Boltzmann distribution, we demonstrate that this relation yields an approximation of the minimum free-energy path for the Müller–Brown potential in a single fit. Second, we prove that if the data lies on a manifold that can be approximated by the encoder, latent components beyond the manifold dimension are conditionally independent of the data distribution - carrying no additional information - and thus reveal the intrinsic dimension. Together, these results show that a single model can learn the data distribution and its intrinsic dimension with exact guarantees simultaneously, unifying two longstanding goals of unsupervised learning.

Type
Publication
NeurIPS 2025

For a recently introduced class of autoencoders, we prove that:

  1. the encoder level sets align exactly with the data score, and
  2. extra latent dimensions beyond the data manifold become completely uninformative, revealing the intrinsic dimension.

These hold simultaneously and get around the usual reconstruction/disentanglement trade-off in unsupervised learning.

The former also leads to a remarkable performance for physical simulation data, where the method can recover the minimum free energy path, with the potential of speeding up simulations.

Andrej Leban
Andrej Leban
Ph.D. Student