APS Logo

Specialization-generalization transition in exemplar-based in-context learning

ORAL

Abstract

In-context learning (ICL) is a striking behavior seen in pretrained transformers that allows models to generalize to unseen tasks after seeing only a few examples. We investigate empirically the conditions necessary on the pretraining distribution for generalized ICL to emerge. A model that exhibits generalized ICL is able to generalize to new tasks outside of the pretraining task distribution, while a model exhibiting specialized ICL generalizes only to new tasks within the pretraining task distribution. Previous work has focused on the number of distinct tasks necessary in the pretraining distribution for the model to exhibit ICL (in any form) — here, we introduce another axis of task diversity, based on the similarity between tasks, to study the emergence of generalized ICL in transformers trained on linear functions. We find that as task diversity increases, transformers undergo a transition from specialized to generalized ICL. The role of task similarity and the number of distinct pretraining tasks in eliciting generalized ICL is examined through a phase diagram, which illustrates the conditions on the pretraining distribution necessary for generalized ICL to emerge. We also explore the nature of the solutions learned by the transformer on both sides of the transition. Further experiments show that such specialization-generalization transitions persist in more complex, nonlinear settings.

Presenters

  • Chase Waring Goddard

    Princeton University

Authors

  • Chase Waring Goddard

    Princeton University

  • Lindsay Maleckar Smith

    Princeton University

  • Vudtiwat Ngampruetikorn

    University of Sydney

  • David J Schwab

    CUNY Graduate Center, The Graduate Center, CUNY, CUNY