Incentivizing Truthful Data Sharing in Collaborative Machine Learning and Data Marketplaces
Professor Kirthevasan Kandasamy (Computer Sciences)
Event Details
Abstract: Modern data marketplaces and data sharing consortia increasingly rely on incentive mechanisms to encourage agents to contribute data. However, schemes that reward agents based on the quantity of submitted data are vulnerable to manipulation, as agents may submit fabricated or low-quality data to inflate their rewards. Prior work has proposed comparing each agent's data against others' to promote honesty: when others contribute genuine data, the best way to minimize discrepancy is to do the same. Yet prior implementations of this idea rely on very strong assumptions about the data distribution or narrow models of agent strategic behavior, limiting their applicability.
In this talk, I will highlight some of our recent work in this space. First, I will discuss our work on mean estimation problems. Then, I will present our work which develops reward mechanisms based on a novel, two-sample test inspired by the Cramér-von Mises statistic. Our methods strictly incentivize agents to submit more genuine data, while disincentivizing data fabrication and other types of untruthful reporting. We establish that truthful reporting constitutes a Nash equilibrium. We theoretically instantiate our method in three canonical problems in data sharing and data marketplaces and show that it relaxes key assumptions made by prior work. Empirically, we demonstrate that our mechanism incentivizes truthful data sharing via simulations and on real-world language and image data.
Relevant papers:
Clinton et al, A Cramér-von Mises Approach to Incentivizing Truthful Data Sharing (Neurips 2025): https://arxiv.org/abs/2506.07272
Clinton et al, Collaborative Mean Estimation Among Heterogeneous Strategic Agents: Individual Rationality, Fairness, and Truthful Contribution (ICML 2025): https://arxiv.org/pdf/2407.15881
Chen et al: Mechanism Design for Collaborative Normal Mean Estimation (Neurips 2023): https://arxiv.org/abs/2506.07272
(This talk is part of the weekly Machine Learning Lunch Meetings (MLLM), held every Tuesday from 12:15 to 1:15 p.m. Professors from Computer Sciences, Statistics, ECE, the iSchool, and other departments will discuss their latest research in machine learning, covering both theory and applications. This is a great opportunity to network with faculty and fellow researchers, learn about cutting-edge research at our university, and foster new collaborations. For the talk schedule, please visit https://sites.google.com/view/wiscmllm/home. To receive future weekly talk announcements, please subscribe to our UW Google Group at https://groups.google.com/u/1/a/g-groups.wisc.edu/g/mllm.)
We value inclusion and access for all participants and are pleased to provide reasonable accommodations for this event. Please email jerryzhu@cs.wisc.edu to make a disability-related accommodation request. Reasonable effort will be made to support your request.