APS Logo

Connecting Dynamics and Trainability in Recurrent Neural Networks

ORAL

Abstract

Recurrent neural networks (RNNs) are well-suited for complex sequential learning tasks, but are notoriously difficult to train due to the problem of exploding or vanishing gradients (EVG) of the cost function. Local switch-like multiplicative “gates” were introduced to address this issue by modulating inter-neuron interactions and selectively updating the state of the network, promoting longer time scales. As such, these "gated" RNNs seem to mitigate the EVG, making training tractable. However, the specific role of each gate type on dynamics and training remains unclear. We take a dynamical systems perspective to study these questions for two popular gated RNN architectures: the Gated Recurrent Unit (GRU) and Long Short-Term Memory (LSTM). Using random matrix theory, we elucidate how gating enriches the repertoire of dynamical behavior expressed by these networks. Our approach furthermore sheds light on how gating is able to overcome the EVG by shaping asymptotic stability. Finally, we connect the intrinsic dynamics upon random parameter initialization to the subsequent ease of training GRUs and LSTMs.

Presenters

  • Tankut Can

    Initiative for the Theoretical Sciences, The Graduate Center, CUNY, The Graduate Center, City University of New York

Authors

  • Tankut Can

    Initiative for the Theoretical Sciences, The Graduate Center, CUNY, The Graduate Center, City University of New York

  • Kamesh Krishnamurthy

    Dept. of Physics and Princeton Neuroscience Institute, Princeton University

  • David Schwab

    Initiative for the Theoretical Sciences, The Graduate Center, CUNY and Facebook AI Research