Machine Learning Lunch Meeting
Do LLMs solve novel tasks? An empirical investigation of out-of-distribution generalization
Event Details
Everyone is invited to the weekly Machine Learning Lunch Meetings held Fridays 12:30-1:30pm. Faculty members from Computer Sciences, Statistics, ECE, and other departments will discuss their latest groundbreaking research in machine learning. This is an opportunity to network with faculty and fellow researchers, and to learn about the cutting-edge research being conducted at our university. Please see https://sites.google.com/view/wiscmllm/home for more information.
Speaker: Yiqiao Zhong (STAT)
Abstract: Large language models (LLMs) such as GPT-4 sometimes appeared to be creative, solving novel tasks with a few demonstrations in the prompt. These tasks require the pre-trained models to generalize on distributions different from those from training data---which is known as out-of-distribution (OOD) generalization. For example, in symbolized language reasoning where names/labels are replaced by arbitrary symbols, yet the model can infer the names/labels without any finetuning.
In this talk, I will offer some new angles for understanding the emergent phenomena in LLMs, which hopefully provide empirical foundations for statistical theory for LLMs. By focusing on induction heads, which are a type of pervasive components within LLMs, I will show that learning the right compositional structure is a key to OOD generalization, and this learning process exhibits sharp transitions in training dynamics. Further, I propose the "common bridge representation hypothesis" as a compositional mechanism in Transformers, where a latent subspace in the embedding space acts as a bridge to align multiple attention heads across early and later layers.