Statistics Seminar
Unexplored corners of multi-modal integration in cell biology by Kevin Lin
Event Details
Abstract: Recently, multi-modal single-cell data has been growing in popularity and provides new opportunities to learn how different modalities coordinate within each cell. Our talk today focuses on two main aspects: 1) if we observe pair modalities for each cell simultaneously (for instance, single-cell transcriptomics and proteomics), how do we disentangle the common from distinct "information", and 2) if current technological limitations prevent us from observing both modalities simultaneously, are there general principles that can nonetheless allow us to learn the multi-modal relations? Both aspects are still relatively unexplored in the statistical literature, yet there is substantial biological demand for such tools.
In the first part of the talk, we focus on two methods, Tilted Canonical Correlation Analysis (Tilted-CCA), a linear method to disentangle information shared between both modalities ("common") from that which is unique to only one modality ("distinct"), and its deep-learning extension, which we are developing, called Encoder-splitting Variational Inference (esVI). We discuss this task's relation with the concept of Wyner's common information from information theory. Biologically, we discuss how these methods can enable us to learn transcriptomic-proteomic coordination among microglia to study Alzheimer's disease, and how our findings can suggest new markers of microglia. In the second part of the talk, we focus on the Geometrical Adversarial Autoencoder (GeoAdvAE), which integrates microglial transcriptomics with microglial morphology (i.e., cell shape) to understand which genes may alter microglial morphology as it functions or deteriorates in Alzheimer patients. While this talk focuses mainly on the motivating biology and proposed methods, potential theoretical directions or applications beyond cell biology will be highlighted.