Hybrid Architecture Models: Fundamental Limits and Efficient Constructions
Professor Fred Sala (Computer Sciences) @ Machine Learning Lunch Meetings
Event Details
Transformers are the workhorse architecture for large language models and beyond, but a large class of recently-proposed model architectures seeks to overturn this standard. No single model type appears to be dominant, leading to a variety of tradeoffs that are difficult for practitioners to navigate. Hybrids that combine components from multiple architecture types promise the best of all worlds—superior performance and efficiency—but so far are poorly understood and hard to design.
In this talk, I will take some initial steps towards addressing these challenges. First, we introduce simple but fundamental tasks where hybrid models are provably better than their constituent models (Transformers and state-space models like Mamba). Second, we present a framework for automating the design of hybrid architectures, building on ideas from traditional neural architecture search. Our approach permits using pretrained models, obviating the need to train new hybrids from scratch. Finally, I will discuss some open questions in the development and use of hybrids.
(This talk is part of the weekly Machine Learning Lunch Meetings (MLLM), held every Tuesday from 12:15 to 1:15 p.m. Professors from Computer Sciences, Statistics, ECE, the iSchool, and other departments will discuss their latest research in machine learning, covering both theory and applications. This is a great opportunity to network with faculty and fellow researchers, learn about cutting-edge research at our university, and foster new collaborations. For the talk schedule, please visit https://sites.google.com/view/wiscmllm/home. To receive future weekly talk announcements, please subscribe to our UW Google Group at https://groups.google.com/u/1/a/g-groups.wisc.edu/g/mllm.)
We value inclusion and access for all participants and are pleased to provide reasonable accommodations for this event. Please email jerryzhu@cs.wisc.edu to make a disability-related accommodation request. Reasonable effort will be made to support your request.