Best of three worlds? Bias and extrapolation in constant-stepsize stochastic approximation
Machine Learning Lunch Meeting: Yudong Chen, Tuesday March 21, 12:15pm CS 1240
Event Details
You are cordially invited to the weekly CS Machine Learning Lunch Meetings. This is a chance to get to know machine learning professors, and talk to your fellow researchers. Our next meeting will be on Tuesday March 21 12:12-1:30pm in CS 1240. Professor Yudong Chen will lead us beyond stochastic gradient descent, see abstract below.
If you would like to be informed of future CS Machine Learning Lunch Meetings, please sign up our mailing list at https://lists.cs.wisc.edu/mailman/listinfo/mllm -- please use your cs or wisc email. After you enter your email, the system will send you an email for confirmation. Only after you respond to that email will you be on the mailing list.
Abstract:
Stochastic approximation is a class of iterative procedures that include, among others, stochastic gradient descent and temporal difference (TD) learning for reinforcement learning. In this talk, we consider linear stochastic approximation (LSA) with a constant stepsize and Markovian noise. We show that the iterate has a limit distribution and converges to it geometrically in Wasserstein distance. Furthermore, we provide a precise characterization of the asymptotic bias of the limit: it admits a Taylor-like, infinite series expansion with respect to the stepsize. Moreover, the bias is proportional to the mixing time of the Markovian noise and hence vanishes when the noise is in fact i.i.d.
The results above suggest the following algorithmic paradigm for LSA, which combines three ingredients: (1) use a constant stepsize for fast convergence of the optimization error, (2) use Polyak-Ruppert averaging to reduce the variance, and (3) use Richardson-Romberg extrapolation to correct the bias. In particular, our infinite series expansion implies that extrapolation with m stepsizes eliminates the m-1 leading terms in the bias expansion, resulting in an exponentially smaller bias.