Equilibrium and non-Equilibrium regimes in the learning of Restricted Boltzmann Machines
ORAL
Abstract
Training Restricted Boltzmann Machines (RBMs) has been challenging for a long
time due to the difficulty of computing precisely the log-likelihood gradient. Over
the past decades, many works have proposed more or less successful training
recipes but without studying the crucial quantity of the problem: the mixing time.
In this work, we show that this mixing time plays a crucial role
in the dynamics and stability of the trained model, and that RBMs operate in two
well-defined regimes, namely equilibrium and out-of-equilibrium, depending on
the interplay between this mixing time of the model and the number of steps, k,
used to approximate the gradient. We further show empirically that this mixing
time increases with the learning, which often implies a transition from one regime
to another as soon as k becomes smaller than this time. In particular, we show that
using the popular k (persistent) contrastive divergence approaches, with k small,
the dynamics of the learned model are extremely slow and often dominated by
strong out-of-equilibrium effects. On the contrary, RBMs trained in equilibrium
display faster dynamics, and a smooth convergence to dataset-like configurations
during the sampling.
time due to the difficulty of computing precisely the log-likelihood gradient. Over
the past decades, many works have proposed more or less successful training
recipes but without studying the crucial quantity of the problem: the mixing time.
In this work, we show that this mixing time plays a crucial role
in the dynamics and stability of the trained model, and that RBMs operate in two
well-defined regimes, namely equilibrium and out-of-equilibrium, depending on
the interplay between this mixing time of the model and the number of steps, k,
used to approximate the gradient. We further show empirically that this mixing
time increases with the learning, which often implies a transition from one regime
to another as soon as k becomes smaller than this time. In particular, we show that
using the popular k (persistent) contrastive divergence approaches, with k small,
the dynamics of the learned model are extremely slow and often dominated by
strong out-of-equilibrium effects. On the contrary, RBMs trained in equilibrium
display faster dynamics, and a smooth convergence to dataset-like configurations
during the sampling.
–
Publication: https://arxiv.org/pdf/2105.13889.pdf accepted in Neurips2021
Presenters
-
Aurélien Decelle
Universidad Complutense de Madrid
Authors
-
Aurélien Decelle
Universidad Complutense de Madrid
-
Beatriz Seoane
Universidad Complutense de Madrid, Univ Complutense
-
Cyril Furtlehner
Paris Saclay University, Inria, Université Paris Saclay