Skip to main content

Machine Learning Lunch Meeting

Do You Interpret Your t-SNE Embeddings Correctly? A Perspective from Map-Continuity and Leave-One-Out

Event Details

Date
Tuesday, March 4, 2025
Time
1-2 p.m.
Location
Description

General: MLLM is a cross-discipline weekly seminar open to all where UW-Madison professors present their research in machine learning, both theory and applications.  The goal is to promote the great work at UW-Madison and stimulate collaborations. Please see our website for more information.

Speaker: Yiqiao Zhong (STAT)

Abstract: Neighbor embedding methods such as t-SNE, UMAP, and LargeVis are widely used for visualizing high-dimensional data. A common belief is that these methods serve as nonlinear dimension reduction tools which, similar to PCA, learn low-dimensional manifold structures from the data.

In this talk, I will present evidence to show that this view is inaccurate: the embedding maps of t-SNE, UMAP, and LargeVis can exhibit discontinuity points, leading to unintended topological distortions. A key challenge in analyzing these visualization methods is that the embedding points are obtained by solving highly complicated optimization problems. To address this, I’ll introduce the leave-one-out (LOO) surrogate, or LOO-map, which captures the properties of the embedding maps. Our analysis identifies two types of discontinuity patterns: (1) global discontinuities, which promote artificial cluster structures, and (2) local discontinuities, which promotes subclusters. To mitigate these issues, I’ll propose two diagnostic pointwise scores that help detect out-of-distribution samples in deep learning and assisting hyperparameter tuning in single-cell data analysis.

This talk is based on a joint work with Zhexuan Liu (3rd-year Stats PhD student) and Rong Ma (Harvard Biostatistics): arXiv:2410.16608.

Cost
Free

Tags