Skip to main content

Machine Learning Lunch Meeting

Transparent Synthetic Data Generation

Event Details

Thursday, April 11, 2024
1 p.m.

Everyone is invited to the weekly machine learning lunch meetings, where our faculty members from Computer Science, Statistics, ECE, and other departments will discuss their latest groundbreaking research in machine learning. This is an opportunity to network with faculty and fellow researchers while learning about the cutting-edge research being conducted at our university. See for more information.

Speaker: Kris Sankaran (STAT)

Abstract: Simulators lie at the heart of biomedical data science: by using historical data to help researchers imagine hypothetical experimental results, they can inform study design, method benchmarking, and statistical inference. For this reason, a flurry of research activity has centered around methodology for realistic synthetic data generation. Unfortunately, many of these simulators are treated as black boxes — their interfaces do not invite users to tinker with any underlying components or learn the mathematical ideas behind them. To address this, we draw from the literature on interactive systems design and re-imagine the interface to the scDesign3 family of multi-omics simulators. We define new abstractions that make the simulation workflow more modular, composable, and accessible. We also introduce visualizations that encourage exploration and criticism of the learned generative models. Some case studies on microbiome and single-cell data highlight how this approach can simplify power analysis and synthetic null hypothesis testing.