Anyone attending an indoor, in-person event must comply with the chancellor’s order on wearing masks while indoors in campus buildings or facilities, regardless of vaccination status.
Online Seminar: Structure to the Rescue: Breaking Data Barriers in Machine Learning
Frederic Sala: Postdoc, Stanford Computer Science Department
Abstract: The current machine learning zeitgeist is that models are only as good as the data they are fed, so that limitations in the data---and especially mismatches with the ML algorithm---present fundamental barriers to model performance. However, for ML to continue its growth and be safely and widely deployed across domains with significant societal impact, such limitations must be minimized. In this talk, I will describe two ways to exploit structure in data to overcome apparent obstacles, with theoretical guarantees.
First, I will argue that geometry is a barrier to producing quality representations used by models. The root cause is a mismatch between the geometric structure of the data and the geometry of the model---but the issue can be resolved by adopting matching non-Euclidean geometries, relying on, for example, hyperbolic geometry for hierarchical data. Next, motivated by the fact that labeling large datasets is a major bottleneck in supervised learning, I will discuss a weak supervision framework for automating the process of labeling, overcoming the lack of hand-labeled data. This is done by encapsulating different aspects of manual labeling into heuristics whose structure is characterized by learnable accuracies and correlations. I will describe extensions of this framework to handle multitask, time-series, and other forms of structured data. This framework is widely used in industry, helping drive applications used by millions daily.
Bio: Frederic Sala is a postdoctoralscholar in the Stanford Computer Science Department, advised by Chris Ré. His research interests include machine learning, data-driven systems, and information theory, and in particular the analysis and design of algorithms that operate on diverse and challenging forms of data. He received the Ph.D. and M.S. degrees in Electrical Engineering from UCLA, where he received the Distinguished Ph.D. Dissertation in Signals & Systems Award from the UCLA Electrical Engineering Department, the NSF graduate fellowship, and the Edward K. Rice Outstanding Master’s Student Award.