Skip to main content

Statistics Seminar

Learning from Diverse Data by Ramya Vinayak

Event Details

Wednesday, December 14, 2022
4-5 p.m.

Title: Learning from Diverse Data

Abstract: Machine learning (ML) algorithms are becoming ubiquitous in various application domains such as public health, genomics, psychology, and social sciences. In these domains, data is often obtained from populations that are diverse, e.g., varying demographics, phenotypes, preferences etc. Many ML algorithms focus on learning model parameters that work well on average over the population but do not capture the diversity. On the other hand, such datasets usually have few observations per individual that limits our ability to learn about each individual separately. Question of interest in these scenarios is, how can we reliably capture the diversity in the data?

In this talk, we will address this question in the following settings:

(i) In many applications, we observe count data over the population which can be modeled as Binomial (e.g., polling, surveys, epidemiology) or Poisson (e.g., single cell RNA data) data. As a single or finite parameters do not capture the diversity of the population in such datasets, nonparametric mixtures are often considered. In this setting, we will address the following question, “how well can we learn the distribution of parameters over the population without learning the individual parameters?”

(ii) Learning preferences from human judgements using comparison queries plays a crucial role in cognitive and behavioral psychology, crowdsourcing democracy, surveys in social science applications, and recommendation systems. Models in the literature often focus on learning average preference over the population due to the limitations on the amount of data available per individual. We will discuss some recent results on how we can reliably capture diversity in preferences while pooling together data from individuals.


Ramya Korlakai Vinayak is an assistant professor in the Dept. of ECE and affiliated faculty in the Dept. of Computer Science at the UW-Madison. Her research interests span the areas of machine learning, statistical inference, and crowdsourcing. Her work focuses on addressing theoretical and practical challenges that arise when learning from societal data. Prior to joining UW-Madison, Ramya was a postdoctoral researcher in the Paul G. Allen School of Computer Science and Engineering at the University of Washington. She received her Ph.D. in Electrical Engineering from Caltech. She is a recipient of the Schlumberger Foundation Faculty of the Future fellowship from 2013-15, and an invited participant at the Rising Stars in EECS workshop in 2019. She obtained her Masters from Caltech and Bachelors from IIT Madras