Talk: Towards Principled Post-Training of Large Language Models (ONLINE)

Banghua Zhu: PhD Candidate, UC Berkeley

Date

Wednesday, May 1, 2024

Time

12-1 p.m.

Location

Online

Description

LIVE STREAM: https://uwmadison.zoom.us/j/98774557926?pwd=cTVSd2tDVDZMUTRUOU1wWDF4N3hTdz09

Abstract: In the development of large language models (LLMs), post-training is a critical step that significantly improves model capabilities and aligns them with human preferences. In this talk, I will discuss the design principles behind post-training techniques that have led to the creation of strong compact open models, including Starling-7B and NexusRaven-13B. Starling-7B is the best 7B chat model according to human evaluation in Chatbot Arena, outperforming models such as Llama2-70B-Chat and GPT-3.5-Turbo. NexusRaven-13B surpasses GPT-4 in function calling capabilities.

Specifically, I will discuss existing issues with reinforcement learning from human feedback (RLHF) - a pivotal technique for aligning large language models with human values. I will present improved RLHF algorithms informed by statistical decision theory, along with our high-quality open datasets for RLHF. By combining the enhanced RLHF algorithm with our own dataset, we created the Starling model suites. Our techniques and the resulting models contribute to the understanding of learning human preferences and aligning language models with human values.

Bio: Banghua is a final-year PhD student at Berkeley, advised by Professor Michael I. Jordan and Jiantao Jiao. Banghua's research focuses on statistics and information theory, with applications in contract theory, noisy computing, robust statistics, reinforcement learning, large language models and machine learning systems. He is a recipient of the David Sakrison Memorial Prize for outstanding PhD research at Berkeley EECS.

Cost

Free

Calendar

Click a date to see events on that day.

		May
S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Talk: Towards Principled Post-Training of Large Language Models (ONLINE)

Tags

Calendar

Search

Categories

Browse events by tag

Talk: Towards Principled Post-Training of Large Language Models (ONLINE)

Event Details

Tags

Calendar

View events by date

Search

Search for events

Categories

Browse events by tag