APS Logo

Same features, different encodings: three case studies of path dependence in grokking and learning.

ORAL

Abstract

Neural network training is a complicated dynamical process. Whether or not the outcome of training depends upon the learning path has deep implications for how we can understand and use neural networks. Two extremes are grokking – where a network learns after a long period of overtraining - and “steady” learning, where the training and test loss improve together. We investigate three simple tasks in which we induce both learning paths: classifying phases of the Ising model from snapshots, the modular addition problem in which grokking was first discovered, and the benchmark MNIST task.

Using techniques from interpretability and information geometry, we systematically contrast the features, encodings, and trajectories of grokking and "steady" learning. First, we find that the features learned in our example problems are the same in both paths. The features of the network trained on Ising phases in particular are very clear – the model learns to calculate the energy of a snapshot. Second, although the features are the same for both grokking and learning, the efficiency of their encodings can be dramatically different – by up to an order of magnitude. Finally, we show that the accuracy plateau in grokking is typically associated with exponential decay of the weights in the number of epochs, and that the grokking time appears to exhibit power law scaling across more than four decades of weight decay.

Presenters

  • Dmitry Manning-Coe

    University of Illinois at Urbana-Champaign

Authors

  • Dmitry Manning-Coe

    University of Illinois at Urbana-Champaign

  • Jacopo Gliozzi

    University of Illinois at Urbana-Champaign

  • Alexander G Stapleton

    Queen Mary University of London

  • Edward Hirst

    Queen Mary University of London

  • Marc Klinger

    University of Illinois at Urbana-Champaign

  • Guiseppe de Tomasi

    University of Illinois Urbana-Champaign

  • David S Berman

    Queen Mary University of London