Machine Learning Lunch Meeting
Enhancing Microbiome Analysis with Semisynthetic Data
Event Details
General: MLLM is a cross-discipline weekly seminar open to all where UW-Madison professors present their research in machine learning, both theory and applications. The goal is to promote the great work at UW-Madison and stimulate collaborations. Please see our website for more information.
Speaker: Kris Sankaran (STAT)
Abstract: Effective analysis of microbiome data requires careful preprocessing, modeling, and interpretation to detect subtle signals and avoid spurious associations. In this talk, we will discuss how semisynthetic simulation can serve as a sandbox to test candidate approaches. Semisynthetic simulators differ from de novo simulators by being trained on an experimental template, allowing them to better reflect the properties of real-world data, and are an important recent development in statistical methodology for genomics analysis. Semisynthetic simulators create a setting that mimics real data while providing realistic ground truth, making them a valuable resource for addressing challenges in power analysis, methods benchmarking, and reliability analysis. We will describe the core statistical lying behind semisynthetic simulators and will highlight useful software design patterns. Recognizing that all simulators only approximate reality, we will discuss strategies to evaluate how accurately the simulated data reflect properties of the original data. We also present case studies demonstrating the value of simulation in power analysis, data integration, and mediation studies. Code for all examples can be found in accompanying vignettes (https://go.wisc.edu/8994yz, https://go.wisc.edu/q32phz).
This talk is based on the following articles:
- Semisynthetic Simulation for Microbiome Data Analysis. Kris Sankaran, Saritha Kodikara, Jingyi Jessica Li, and Kim-Anh Le Cao. Briefings in Bioinformatics (2024). https://doi.org/10.1093/bib/bbaf051
- Multimedia: Multimodal Mediation Analysis of Microbiome Data. Hanying Jiang, Xinran Miao, Margaret W. Thairu, Mara Beebe, Dan W. Grupe, Richard J. Davidson, Jo Handelsman, and Kris Sankaran. Microbiology Spectrum 13, no. 2 (2025). American Society for Microbiology. https://doi.org/10.1128/spectrum.01131-24
- Generative Models: An Interdisciplinary Perspective. Kris Sankaran and Susan P. Holmes. Annual Review of Statistics and its Application 10, no. 1 (2023). https://doi.org/10.1146/annurev-statistics-033121-110134