Machine Learning Lunch Meeting
Recent Advances in Average-Reward Restless Bandits
Event Details
Everyone is invited to the weekly machine learning lunch meetings, where our faculty members from Computer Science, Statistics, ECE, and other departments will discuss their latest groundbreaking research in machine learning. This is an opportunity to network with faculty and fellow researchers and to learn about the cutting-edge research being conducted at our university.
Speaker: Qiaomin Xie
Abstract: We study the infinite-horizon restless bandit problem with the average reward criterion. A fundamental question is how to design computationally efficient policies that achieve a diminishing optimality gap as the number of arms, N, grows large. Existing results on asymptotical optimality all rely on the uniform global attractor property (UGAP), a complex and challenging-to-verify assumption. In this talk, we discuss how this assumption can be removed. We propose a general, simulation-based framework, that converts any single-armed policy into a policy for the original N-armed problem. Our framework can be instantiated to produce a policy with an $O(1/\sqrt{N})$ optimality gap. In discrete time, we assume a simpler synchronization assumption, covering non-UGAP problem instances. More notably, in continuous time, our result only requires the standard unichain condition. Our work gives the first asymptotic optimality result without UGAP.