Statistics Seminar
Distribution-Free, Risk-Controlling Prediction Sets presented by Stephen Bates UC Berkeley
Event Details
Abstract: While improving prediction accuracy has been the focus of machine learning in recent years, this alone does not suffice for reliable decision-making. Deploying predictive models in consequential settings also requires analyzing and communicating their uncertainty. To give valid inference for prediction tasks, we show how to generate set-valued predictions from any black-box predictive model that control certain statistical error rates on future test points at a user-specified level. Our approach provides explicit finite-sample guarantees for any dataset and model by using a holdout set to calibrate the size of the prediction sets. This framework enables simple, distribution-free, rigorous error control for many tasks, and we demonstrate it in four large-scale prediction problems: (1) multi-label classification, where each observation has multiple associated labels; (2) classification problems where the labels have a hierarchical structure; (3) image segmentation, where we wish to predict a set of pixels containing an object of interest; and (4) protein structure prediction.