A simple model for Grokking modular arithmetic

Andrey Gromov

A simple model for Grokking modular arithmetic

ORAL

Abstract

Grokking is a sudden onset of generalization following a long period of overfitting. This effect was first discovered empirically on datasets generated by a discrete rule such as the multiplication table for finite groups.

In this talk I will present a simple neural network that groks a variety of modular arithmetic tasks. The network consists of a single hidden layer and a quadratic activation function (which can be replaced with more popular activation functions if so desired). I will show that (i) the model exhibits grokking on modular arithmetic tasks under vanilla gradient descent, MSE loss function, and in the absence of any regularization; (ii) grokking corresponds to learning very specific features whose structure is determined by the modular arithmetic task at hand; (iii) I will provide an analytic expression for the weights that solve modular addition problem and are found by gradient descent thereby establishing complete interpretability of the algorithm learnt by the network.

March 6, 2023, 8:48 PM – March 6, 2023, 9:00 PM

Presenters

Andrey Gromov

University of Maryland, College Park

Authors

Andrey Gromov

University of Maryland, College Park