Machine Learning Lunch Meeting
Do You Interpret Your t-SNE Embeddings Correctly? A Perspective from Map-Continuity and Leave-One-Out
Event Details
General: MLLM is a cross-discipline weekly seminar open to all where UW-Madison professors present their research in machine learning, both theory and applications. The goal is to promote the great work at UW-Madison and stimulate collaborations. Please see our website for more information.
Speaker: Yiqiao Zhong (STAT)
Abstract: Neighbor embedding methods such as t-SNE, UMAP, and LargeVis are widely used for visualizing high-dimensional data. A common belief is that these methods serve as nonlinear dimension reduction tools which, similar to PCA, learn low-dimensional manifold structures from the data.
In this talk, I will present evidence to show that this view is inaccurate: the embedding maps of t-SNE, UMAP, and LargeVis can exhibit discontinuity points, leading to unintended topological distortions. A key challenge in analyzing these visualization methods is that the embedding points are obtained by solving highly complicated optimization problems. To address this, I’ll introduce the leave-one-out (LOO) surrogate, or LOO-map, which captures the properties of the embedding maps. Our analysis identifies two types of discontinuity patterns: (1) global discontinuities, which promote artificial cluster structures, and (2) local discontinuities, which promotes subclusters. To mitigate these issues, I’ll propose two diagnostic pointwise scores that help detect out-of-distribution samples in deep learning and assisting hyperparameter tuning in single-cell data analysis.
This talk is based on a joint work with Zhexuan Liu (3rd-year Stats PhD student) and Rong Ma (Harvard Biostatistics): arXiv:2410.16608.