Machine Learning Lunch Meeting
Steering Large Language Models by Human Preferences
Everyone is invited to the weekly machine learning lunch meetings, where our faculty members from Computer Science, Statistics, ECE, and other departments will discuss their latest groundbreaking research in machine learning. This is an opportunity to network with faculty and fellow researchers and to learn about the cutting-edge research being conducted at our university. More information is available at https://sites.google.com/view/wiscmllm/home.
Speaker: Sharon Li
Abstract: Large language models (LLMs) trained on massive datasets exhibit remarkable abilities, but at the same time, these models can inadvertently generate misinformation and harmful outputs. This concern underscores the urgent challenge of language model alignment: ensuring these models' behaviors agree with human preferences and safety considerations. In recent years, a spectrum of alignment strategies have emerged, with prominent methods showcasing the effectiveness of reinforcement learning with human feedback (RLHF). RLHF has gained widespread adoption among state-of-the-art models, including OpenAI’s GPT-4, Anthropic’s Claude, Google’s Bard, and Meta’s Llama 2-Chat. A pivotal component within RLHF is proximal policy optimization, which employs an external reward model that mirrors human preferences for its optimization process. Despite the promise, RLHF suffers from unstable and resource-intensive training. Furthermore, the need to repeat PPO training when altering the reward model hinders rapid customization to evolving datasets and emerging needs. In this talk, I will discuss alternative paradigms on how we can achieve alignment without RL training.