Talk: Making Data Part of Data Structure Design
Brian Hentschel: Ph.D. Candidate, Harvard
Abstract: Data structures are used throughout all parts of computer programs and as a result improvements in their performance enable new applications, lower operational costs, and create better user experiences.
The traditional paradigm for creating data structures is to design them without formal knowledge of their context. Data structures are typically analyzed through worst-case performance bounds that hold regardless of their input data, and even for more complex systems where data structures have been carefully crafted for performance, their dependence on data is still implicit. Engineers think about the data they expect, but they rarely form explicit hypotheses about their data and use this as part of the design process.
In this talk, I present a new class of data structure designs that use data explicitly as part of the design process. In particular, I present a framework wherein data structures are designed by 1) choosing workload properties to estimate from samples of past data and operations, and 2) using these estimated properties to control the shape and algorithms of created data structures. I show that data structures designed using data can achieve both excellent performance on a target workload and retain similar worst-case performance bounds to prior approaches with respect to arbitrary data inputs. As examples, I show applications to filter data structures, wherein this approach achieves 100X better false positive rates compared to prior classical approaches, and to hash tables, wherein this approach produces 4X better performance than state of the art hash tables from Google and Meta.
Bio: Brian Hentschel is a Ph.D. candidate at Harvard University advised by Stratos Idreos. His research focuses on the ways knowledge of context can create better data systems, and his research in this area has been awarded an ACM SIGMOD research highlight award as well a best paper award from EDBT. He earned his BA in mathematics and computer science at Pomona College and has previously spent time at Microsoft Research, IBM Research, Amazon, and LinkedIn.