Skip to main content

Talk: Deep Equilibrium Models

Shaojie Bai: Ph.D. Candidate, Machine Learning Department, Carnegie Mellon University

Event Details

Thursday, March 31, 2022
4-5 p.m.


Abstract: Modern artificial intelligence (AI), and the field of deep learning in particular, has achieved remarkable success on a variety of domains, using networks that have tens to hundreds of layers and billions of parameters. But does deep learning actually need to be deep? In my talk, I will present our recent and ongoing work on Deep Equilibrium (DEQ) Models, an approach that demonstrates we can achieve most of the benefits of modern deep learning systems using very shallow models. But unlike traditional deep networks, these models need to be defined **implicitly** via finding fixed points of nonlinear dynamical systems. I will show that these methods can achieve results on par with the state-of-the-art in domains spanning large-scale language modeling, image classification, semantic segmentation, and optical flow, while requiring only O(1) memory and simplifying architectures substantially; this raises exciting opportunities to apply performant AI models to more optimization-based, low-resource and low-latency (i.e., real-time) settings. I will conclude by discussing ongoing work and future directions for this class of models in these areas.

Bio: Shaojie Bai is a graduating Ph.D. student in the Machine Learning Department of Carnegie Mellon University (CMU), advised by J. Zico Kolter. Shaojie’s research focuses on deep learning architectures, with a specific focus on 1) the scalability and representational capacity of implicit-depth models, which involves rethinking some of the current “deep approaches”; and 2) the unification of different model families in deep sequence modeling. He was a J.P. Morgan AI Ph.D. fellow, his work has received multiple spotlight and oral presentations at AI conferences, and he led a team that won 1st place in a competition on predicting molecular properties.  Previously, Shaojie received his B.S. in Computer Science and B.S. in Applied Mathematics from CMU in 2017, where he graduated with University Honor.