Talk: Efficient and Accurate Systems for Querying Unstructured Data

Daniel Kang: PhD student in the Stanford DAWN lab

Date

Monday, February 14, 2022

Time

4-5 p.m.

Location

Description

Abstract: Over the past 60 years, structured databases have been a runaway success: they are deployed at every major organization and have produced hundreds of billions in value. However, there has been a growing demand for analytics over unstructured data (e.g., videos, audio, text) given the cheapness of sensors and the rise of ML capabilities. Unfortunately, ML can be prohibitively expensive to deploy (e.g., 10 orders of magnitude more expensive than standard structured analytics) and produce incorrect results.

In this talk, I'll describe my work on new ML-based query systems to tackle these challenges. My first line of work accelerates large classes of queries by orders of magnitude while providing strong guarantees on query accuracy. I accomplish this by developing novel query processing algorithms, indexing methods, and execution engines for unstructured data queries. I'll also describe how to find errors in human labels and ML model outputs using novel data management systems. Perhaps surprisingly, our systems discovered a large number of errors in a popular autonomous vehicle dataset and can be used to improve ML models. My research has been deployed at an autonomous vehicle company and has enabled new forms of video analytics for ecologists at the Jasper Ridge biological preserve.

Bio: Daniel Kang is a sixth year PhD student in the Stanford DAWN lab, co-advised by Professors Peter Bailis and Matei Zaharia. His research focuses on systems to query unstructured data. In particular, he focuses on using cheap approximations to accelerate query processing algorithms and new programming models for ML data management. Daniel is collaborating with autonomous vehicle companies and ecologists to deploy his research. His work is supported in part by the NSF GRFP and the Google PhD fellowship.

Cost

Free

Calendar

Click a date to see events on that day.

		May
S	M	T	W	T	F	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Talk: Efficient and Accurate Systems for Querying Unstructured Data

Tags

Calendar

Search

Categories

Browse events by tag

Talk: Efficient and Accurate Systems for Querying Unstructured Data

Event Details

Tags

Calendar

View events by date

Search

Search for events

Categories

Browse events by tag