Connecting Dynamics and Trainability in Recurrent Neural Networks
ORAL
Abstract
Recurrent neural networks (RNNs) are well-suited for complex sequential learning tasks, but are notoriously difficult to train due to the problem of exploding or vanishing gradients (EVG) of the cost function. Local switch-like multiplicative “gates” were introduced to address this issue by modulating inter-neuron interactions and selectively updating the state of the network, promoting longer time scales. As such, these "gated" RNNs seem to mitigate the EVG, making training tractable. However, the specific role of each gate type on dynamics and training remains unclear. We take a dynamical systems perspective to study these questions for two popular gated RNN architectures: the Gated Recurrent Unit (GRU) and Long Short-Term Memory (LSTM). Using random matrix theory, we elucidate how gating enriches the repertoire of dynamical behavior expressed by these networks. Our approach furthermore sheds light on how gating is able to overcome the EVG by shaping asymptotic stability. Finally, we connect the intrinsic dynamics upon random parameter initialization to the subsequent ease of training GRUs and LSTMs.
–
Presenters
-
Tankut Can
Initiative for the Theoretical Sciences, The Graduate Center, CUNY, The Graduate Center, City University of New York
Authors
-
Tankut Can
Initiative for the Theoretical Sciences, The Graduate Center, CUNY, The Graduate Center, City University of New York
-
Kamesh Krishnamurthy
Dept. of Physics and Princeton Neuroscience Institute, Princeton University
-
David Schwab
Initiative for the Theoretical Sciences, The Graduate Center, CUNY and Facebook AI Research