Theory and Systems for Weak Supervision
Chris Ré (Stanford)
If you want to build a high-quality machine learning product, build a large, high-quality training set. At first glance, this seems as useful as the statement “if you want to be rich, get a lot of money.” However, a key idea driving our work is that new theoretical and systems concepts such as weak supervision, automatic data augmentation policies, and more, can enable engineers to build training sets and so AI applications more quickly and cost effectively.
Along with state-of-the-art results on benchmarks, these concepts have allowed our group and collaborators to build a range of applications including patient-care monitoring on electronic health records, automatic triage systems for radiologists, and enabling cardiologists to spot rare abnormalities in video MRI—-along with widely used products from Apple and Google that you may have recently used. This talk describes the theoretical and systems challenges that such applications create along with some of our initial solutions. As with any new high level abstraction, these new systems raise fundamental challenges in debugging that we are just beginning to understand. The talk will conclude with a discussion of rcently identified hidden stratification errors and some theory to help users avoid them. For more information, see http://snorkel.org, http://hazyresearch.stanford.edu/, or my personal website.
Bio: Christopher (Chris) Re is an associate professor in the Department of Computer Science at Stanford University. He is in the Stanford AI Lab and is affiliated with the Statistical Machine Learning Group. His recent work is to understand how software and hardware systems will change as a result of machine learning along with a continuing, petulant drive to work on math problems. Research from his group has been incorporated into scientific and humanitarian efforts, such as the fight against human trafficking, along with widely used products from technology and enterprise companies including Google Ads, GMail, YouTube, and Apple. He has cofounded four companies based on his research into machine learning systems,SambaNova and Snorkel, along with two companies that are now part of Apple, Lattice (DeepDive) in 2017 and Inductiv (HoloClean) in 2020.
He received a SIGMOD Dissertation Award in 2010, an NSF CAREER Award in 2011, an Alfred P. Sloan Fellowship in 2013, a Moore Data Driven Investigator Award in 2014, the VLDB early Career Award in 2015, the MacArthur Foundation Fellowship in 2015, and an Okawa Research Grant in 2016. His research contributions have spanned database theory, database systems, and machine learning, and his work has won best paper at a premier venue in each area, respectively, at PODS 2012, SIGMOD 2014, and ICML 2016.