Distributional Autoencoders Know the Score

The encoding for the Müller-Brown potential

Abstract

This work presents novel and desirable properties of a recently introduced class of autoencoders - the Distributional Principal Autoencoder (DPA) - which combines distributionally correct reconstruction with principal components-like interpretability of the encodings. First, we show formally that the level sets of the encoder orient themselves exactly with regard to the score of the data distribution. This both explains the method’s often remarkable performance in disentangling the factors of variation of the data, as well as opens up possibilities of recovering its distribution while having access to samples only. In settings where the score itself has physical meaning - such as when the data obeys the Boltzmann distribution - we demonstrate that the method can recover scientifically important quantities such as the minimum free energy path. Second, we prove that if the data lies on a manifold that can be approximated by the encoder, the optimal encoder’s components beyond the dimension of the manifold will carry absolutely no additional information about the data distribution. This promises potentially new ways of determining the number of relevant dimensions of the data. The results thus demonstrate that the DPA elegantly combines two often disparate goals of unsupervised learning: the learning of the data distribution and the learning of the intrinsic data dimensionality.

Type
Publication
On arXiv

For a recently introduced class of autoencoders, we prove that they orient the encoding level sets exactly with respect to the data distribution, and if the latter lies on a lower-dimensional manifold, the extra encoding dimensions carry absolutely no additional information. The former leads to a remarkable performance in cases where the data comes from a physical simulation, where the method can recover the minimum free energy path, while the latter opens up possibilities of testing for manifold dimensionality, for example.

Andrej Leban
Andrej Leban
Ph.D. Student