Skip to main content

Machine Learning Lunch Meeting

Data without Borders: Game-theoretic Challenges in Democratizing Data

Event Details

Thursday, March 7, 2024
1 p.m.

Everyone is invited to the weekly machine learning lunch meetings, where our faculty members from Computer Science, Statistics, ECE, and other departments will discuss their latest groundbreaking research in machine learning. This is an opportunity to network with faculty and fellow researchers while learning about the cutting-edge research being conducted at our university. See for more information.

Speaker: Kirthevasan (Kirthi) Kandasamy (CS)

Abstract: Due to advances in AI, many organizations view data as an invaluable resource, likening it to the "new oil/gold". Unlike many types of resources, data is nonrivalrous: it can be freely replicated and used by many. Hence, data produced by one organization, can, in principle, generate limitless value to many other organizations. This will accelerate economic, social, and scientific breakthroughs and benefit society at large. However, considerations of free-riding and competition may prevent such open sharing of data between organizations. An organization may be wary that others may not be contributing a sufficient amount of data, or contributing fabricated/poisoned datasets. In some recent work, we leverage ideas from game theory and robust statistics to design protocols for data sharing. Our methods incentivize organizations to truthfully contribute large amounts of data, so that socially optimal outcomes can be achieved.

In this talk, I will first present a high level view of some of our recent approaches to solving these challenges and focus on a mean estimation problem. Here, a set of strategic agents wish to estimate the mean of some distribution, and can each sample from this distribution at a cost. By sharing data with each other, the agents can improve their estimates while keeping data collection costs low. However, if we naively pool everyone's data and share it with each other, agents may find it beneficial to free-ride, either by not collecting or fabricating data. We design a novel incentive-compatible mechanism, which achieves a social penalty (sum of all agents' estimation errors and data collection costs) that is at most a factor 2 of the global minimum.