Machine Learning Lunch Meeting
Can we teach addition to a small language model?
Event Details
Everyone is invited to the weekly machine learning lunch meetings, where our faculty members from Computer Science, Statistics, ECE, and other departments will discuss their latest groundbreaking research in machine learning. This is an opportunity to network with faculty and fellow researchers and to learn about the cutting-edge research being conducted at our university.
Speaker: Dimitris Papailiopoulos
Abstract: Can language models truly “understand" addition? We explore this by trying to teach them to add arbitrary size numbers within their context limits. It turns out that conventional “1+2=3" training data isn't the best to teach them to add, but a few formatting tweaks and training on chain-of-thought style data allows the model to “learn” addition, but only for digit lengths it has seen during training. This leads us to question length generalization: can a model trained on n digits add n+1 digit numbers? Humans don't need to be taught every digit length of addition to be able to perform it. It turns out that language models aren't great at length generalization, but we catch glimpses of it in "unstable" scenarios. Surprisingly, the infamous U-shaped overfitting curve makes an appearance!