Specialization-generalization transition in exemplar-based in-context learning

Chase Waring Goddard; Lindsay Maleckar Smith; Vudtiwat Ngampruetikorn; David J Schwab

Specialization-generalization transition in exemplar-based in-context learning

ORAL

Abstract

In-context learning (ICL) is a striking behavior seen in pretrained transformers that allows models to generalize to unseen tasks after seeing only a few examples. We investigate empirically the conditions necessary on the pretraining distribution for generalized ICL to emerge. A model that exhibits generalized ICL is able to generalize to new tasks outside of the pretraining task distribution, while a model exhibiting specialized ICL generalizes only to new tasks within the pretraining task distribution. Previous work has focused on the number of distinct tasks necessary in the pretraining distribution for the model to exhibit ICL (in any form) — here, we introduce another axis of task diversity, based on the similarity between tasks, to study the emergence of generalized ICL in transformers trained on linear functions. We find that as task diversity increases, transformers undergo a transition from specialized to generalized ICL. The role of task similarity and the number of distinct pretraining tasks in eliciting generalized ICL is examined through a phase diagram, which illustrates the conditions on the pretraining distribution necessary for generalized ICL to emerge. We also explore the nature of the solutions learned by the transformer on both sides of the transition. Further experiments show that such specialization-generalization transitions persist in more complex, nonlinear settings.

March 19, 2025, 12:24 PM – March 19, 2025, 12:36 PM

Presenters

Chase Waring Goddard

Princeton University

Authors

Chase Waring Goddard

Princeton University
Lindsay Maleckar Smith

Princeton University
Vudtiwat Ngampruetikorn

University of Sydney
David J Schwab

CUNY Graduate Center, The Graduate Center, CUNY, CUNY