Specialization-generalization transition in exemplar-based in-context learning
ORAL
Abstract
In-context learning (ICL) is a striking behavior seen in pretrained transformers that allows models to generalize to unseen tasks after seeing only a few examples. We investigate empirically the conditions necessary on the pretraining distribution for generalized ICL to emerge. A model that exhibits generalized ICL is able to generalize to new tasks outside of the pretraining task distribution, while a model exhibiting specialized ICL generalizes only to new tasks within the pretraining task distribution. Previous work has focused on the number of distinct tasks necessary in the pretraining distribution for the model to exhibit ICL (in any form) — here, we introduce another axis of task diversity, based on the similarity between tasks, to study the emergence of generalized ICL in transformers trained on linear functions. We find that as task diversity increases, transformers undergo a transition from specialized to generalized ICL. The role of task similarity and the number of distinct pretraining tasks in eliciting generalized ICL is examined through a phase diagram, which illustrates the conditions on the pretraining distribution necessary for generalized ICL to emerge. We also explore the nature of the solutions learned by the transformer on both sides of the transition. Further experiments show that such specialization-generalization transitions persist in more complex, nonlinear settings.
–
Presenters
-
Chase Waring Goddard
Princeton University
Authors
-
Chase Waring Goddard
Princeton University
-
Lindsay Maleckar Smith
Princeton University
-
Vudtiwat Ngampruetikorn
University of Sydney
-
David J Schwab
CUNY Graduate Center, The Graduate Center, CUNY, CUNY