Machine Learning Lunch Meeting

Can we teach addition to a small language model?

Date

Thursday, November 9, 2023

Time

12 p.m.

Location

Description

Everyone is invited to the weekly machine learning lunch meetings, where our faculty members from Computer Science, Statistics, ECE, and other departments will discuss their latest groundbreaking research in machine learning. This is an opportunity to network with faculty and fellow researchers and to learn about the cutting-edge research being conducted at our university.

Speaker: Dimitris Papailiopoulos

Abstract: Can language models truly “understand" addition? We explore this by trying to teach them to add arbitrary size numbers within their context limits. It turns out that conventional “1+2=3" training data isn't the best to teach them to add, but a few formatting tweaks and training on chain-of-thought style data allows the model to “learn” addition, but only for digit lengths it has seen during training. This leads us to question length generalization: can a model trained on n digits add n+1 digit numbers? Humans don't need to be taught every digit length of addition to be able to perform it. It turns out that language models aren't great at length generalization, but we catch glimpses of it in "unstable" scenarios. Surprisingly, the infamous U-shaped overfitting curve makes an appearance!

Cost

Free

Calendar

Click a date to see events on that day.

		October
S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Machine Learning Lunch Meeting

Tags

Calendar

Search

Categories

Browse events by tag

Machine Learning Lunch Meeting

Event Details

Tags

Calendar

View events by date

Search

Search for events

Categories

Browse events by tag