Beyond Backprop: Different Approaches to Credit Assignment in Neural Nets

Irina Rish

Beyond Backprop: Different Approaches to Credit Assignment in Neural Nets

Invited

Abstract

Backpropagation algorithm (backprop) has been the workhorse of neural net learning for several decades, and its practical effectiveness is demonstrated by recent successes of deep learning in a wide range of applications. This approach uses chain rule differentiation to compute gradients in state-of-the-art learning algorithms such as stochastic gradient descent (SGD) and its variations. However, backprop has several drawbacks as well, including the vanishing and exploding gradients issue, inability to handle non-differentiable nonlinearities and to parallelize weight-updates across layers, and biological implausibility. These limitations continue to motivate exploration of alternative training algorithms, including several recently proposed auxiliary-variable methods which break the complex nested objective function into local subproblems. However, those techniques are mainly offline (batch), which limits their applicability to extremely large datasets, as well as to online, continual or reinforcement learning. The main contribution of our work is a novel online (stochastic/mini-batch) alternating minimization (AM) approach for training deep neural networks, together with the first theoretical convergence guarantees for AM in stochastic settings and promising empirical results on a variety of architectures and datasets.

March 5, 2020, 1:15 PM – March 5, 2020, 1:51 PM

Presenters

Irina Rish

Computer Science and Operations Research (Département d’informatique et recherche opérationnelle), Université de Montréal / Mila - Quebec AI Institute

Authors

Irina Rish

Computer Science and Operations Research (Département d’informatique et recherche opérationnelle), Université de Montréal / Mila - Quebec AI Institute