Talk: Machine Unlearning: Algorithms for Data Deletion in Machine Learning
Saeed Sharifi-Malvajerdi: PhD Student, Statistics and Data Science Department, Wharton School of the University of Pennsylvania
Abstract: After a machine learning model is trained on a data set, there might be requests for deleting certain individual data points. Machine unlearning algorithms aim to remove the influence of deleted data points from trained models at a cheaper computational cost than fully retraining those models. In this talk, I will introduce a formal notion of unlearning and then propose unlearning algorithms that are able to handle an arbitrarily long sequence of deletion requests, in both convex and non-convex settings. Most prior work in the non-convex setting gives valid guarantees only for deletion sequences that are chosen independently of the models that are published. If people choose to delete their data as a function of the published models (because they don't like what the models reveal about them, for example), then the deletion sequence is adaptive. In this talk, I will also discuss a general reduction from unlearning guarantees against adaptive deletion sequences to unlearning guarantees against non-adaptive deletion sequences, using techniques from differential privacy. Combined with ideas from prior work which give guarantees for non-adaptive deletion sequences, this leads to extremely flexible algorithms able to handle arbitrary model classes and training methodologies, giving strong provable unlearning guarantees for adaptive deletion sequences.
Bio: Saeed Sharifi-Malvajerdi is a PhD student in the Statistics and Data Science Department at the Wharton School of the University of Pennsylvania, where he is advised by Michael Kearns and Aaron Roth. His research is in data analysis and machine learning with ethical and societal constraints. More precisely, he works on providing algorithmic and theoretical foundations for privacy-preserving data analysis, algorithmic fairness in machine learning, and efficient data deletion from machine learning models (known as machine unlearning).