Skip to main content

Machine Learning Lunch Meeting

On-policy Reinforcement Learning without On-policy Sampling

Event Details

Thursday, October 26, 2023
12 p.m.

Everyone is invited to the weekly machine learning lunch meetings, where our faculty members from Computer Science, Statistics, ECE, and other departments will discuss their latest groundbreaking research in machine learning. This is an opportunity to network with faculty and fellow researchers and to learn about the cutting-edge research being conducted at our university.

Speaker: Josiah Hanna

Abstract: Reinforcement learning (RL) algorithms are often categorized as either on-policy or off-policy depending on whether they explore by acting according to the learning agent’s current policy or acting according to some other exploration behavior. State-of-the-art on-policy RL algorithms such as proximal policy optimization are noted for their ability to find strong performing policies with minimal tuning in new RL environments. Unfortunately, on-policy algorithms also tend to be tremendously data inefficient which prevents their application to many potential domains. In this talk, I will present recent work where we revisit the definition of on-policy learning and ask what is really necessary for an RL algorithm to be considered on-policy. In revisiting this definition, we develop a new form of on-policy algorithm which learns more data efficiently than traditional on-policy algorithms. In the first part of the talk, I will develop the basic approach for the policy evaluation sub-problem of RL. I will then present more recent work where we have extended this approach to enhance the data efficiency of the widely-used proximal policy optimization algorithm while maintaining its ability to find strong performing policies on common RL testbeds.