Skip to main content

Covid-19 Notice

All campus events (including Division of Extension sponsored events outside of Dane County) are canceled through at least May 15, with limited exceptions to be granted by deans or vice chancellors. Even if an event is not yet labeled as canceled, it's likely to be canceled, postponed or modified to online only, from now until May 15. Please check with organizers before attending.

Unsupervised Validation for Unsupervised Learning

Marina Meila, Professor, University of Washington

Event Details

Date
Monday, September 16, 2019
Time
12:30-1:30 p.m.
Description

Machine learning is many times faster than humans at finding patterns, yet the task of validating these as ``meaningful'' is still left to the human expert or to further experiment. In this talk I will present three instances in which unsupervised learning tasks can be augmented with data driven validation.

In the case of clustering, I will demonstrate a new framework of "proving" that a clustering is approximately correct, that does not require a user to know anything about the data distribution. This framework has some similarities to PAC bounds in supervised learning; unlike PAC bounds, the bounds for clustering can be calculated exactly and can be of direct practical utility.

In the case of non-linear dimension reduction by manifold learning, I will present implementable solutions to the following well known problems. The low dimensional embeddings obtained with manifold learning The output of manifold learning algorithms distorts distances, angles and other geometric properties of the data. Our contribution is a statistically founded methodology to estimate and then cancel out the distortions introduced by an embedding algorithm, thus effectively preserving the distances in the original data. This method is based on the notion of augmenting the algorithm output with a Riemannian metric, i.e., with the information that allows it to reconstruct the original geometry.

The abstract coordintes obtained by dimension reduction are often identified, by visual inspection, with interpretable properties of the data. The third and last part of the talk will describe a method to semi-automate this process. The human expert provides a dictionary of meaningful functions, and our algorithm selects a subset of these that can parametrize a manifold via an arbitrary smooth non-linear transformation.

Joint work with Dominique Perrault-Joncas, James McQueen, Yu-chia Chen, Samson Koelle, Hanyu Zhang

Cost
Free

Tags