APS Logo

A simple model for Grokking modular arithmetic

ORAL

Abstract



Grokking is a sudden onset of generalization following a long period of overfitting. This effect was first discovered empirically on datasets generated by a discrete rule such as the multiplication table for finite groups.

In this talk I will present a simple neural network that groks a variety of modular arithmetic tasks. The network consists of a single hidden layer and a quadratic activation function (which can be replaced with more popular activation functions if so desired). I will show that (i) the model exhibits grokking on modular arithmetic tasks under vanilla gradient descent, MSE loss function, and in the absence of any regularization; (ii) grokking corresponds to learning very specific features whose structure is determined by the modular arithmetic task at hand; (iii) I will provide an analytic expression for the weights that solve modular addition problem and are found by gradient descent thereby establishing complete interpretability of the algorithm learnt by the network.

Presenters

  • Andrey Gromov

    University of Maryland, College Park

Authors

  • Andrey Gromov

    University of Maryland, College Park