Machine Learning Lunch Meeting
Stairway to Specialization: The Path of Scalable Experts
Event Details
Everyone is invited to the weekly Machine Learning Lunch Meetings held Fridays 12:30-1:30pm. Faculty members from Computer Sciences, Statistics, ECE, and other departments will discuss their latest groundbreaking research in machine learning. This is an opportunity to network with faculty and fellow researchers, and to learn about the cutting-edge research being conducted at our university. Please see https://sites.google.com/view/wiscmllm/home for more information.
Speaker: Grigoris Chrysos
Abstract: Large models, such as GPT-4 and multimodal vision-language models, promise to tackle diverse tasks without specific training. These models increasingly utilize the Mixture of Experts (MoE) paradigm, which facilitates specialization and simplifies debugging and model steerability. However, scaling the number of experts to achieve fine-grained specialization presents a significant computational challenge.
This talk will first highlight the real-world implications of this challenge, demonstrating how many established architectures resort to using a limited number of experts, often failing to achieve true specialization. We will then introduce the μMoE layer, which employs tensor algebra to perform implicit computations on large weight tensors in a factorized form. This enables using thousands of experts at once, without increasing the computational cost over single MLP layers. I will showcase how the μMoE layer enhances specialization in both image and text applications, including GPT-2 models. This approach allows for on-demand model tailoring by selectively deactivating experts or posing counterfactual questions.