Machine Learning Lunch Meeting
A Geometric Journey into the World of Large Language Models
Event Details
Everyone is invited to the weekly machine learning lunch meetings, where our faculty members from Computer Science, Statistics, ECE, and other departments will discuss their latest groundbreaking research in machine learning. This is an opportunity to network with faculty and fellow researchers and to learn about the cutting-edge research being conducted at our university.
Speaker: Yiqiao Zhong
Abstract: Transformers are neural networks that underpin the recent success of large language models. They are often used as black-box models and building blocks of complex AI systems. Yet, it is unclear what information is processed through layers of a transformer, which raises the issue of interpretability. In this talk, I will present an empirical study of transformers by examining various pretrained transformer models. A surprisingly consistent geometry pattern emerges in hidden states (or intermediate-layer embeddings) across layers, models, and datasets. Our study (1) provides structural characterization of the learned weight matrices and self-attention mechanism, and (2) suggests that hidden smoothness is essential for the success of transformers.