Skip to main content

MadSystems Seminar -- Roger Waleffe (NVIDIA)

Event Details

Date
Tuesday, April 14, 2026
Time
4-5 p.m.
Location
Description

Title: NVIDIA Nemotron 3: Model Architecture Design and Pre-Training at Scale

Abstract:
In this talk, I will describe the design and pre-training of Nemotron 3 - NVIDIA’s latest flagship large language models. I will first discuss the Nemotron model architecture (hybrid Mamba-Attention, LatentMoE, Multi-Token Prediction) as well as broader trends in modern LLMs. In particular, I will highlight that architectures are becoming more sparse and heterogeneous. I will then focus on pre-training these architectures at scale and discuss the Megatron-LM software stack and its core parallelism techniques for large-scale training (tensor, pipeline, and expert parallelism). I’ll highlight how the aforementioned architecture trends are stressing the assumptions underlying today’s training infrastructure, opening interesting avenues for future model-system co-design.

Bio:
Roger Waleffe is a senior applied deep learning research scientist at NVIDIA. He works on pre-training of NVIDIA’s Nemotron models, with a specific focus on studying and developing efficient large language model architectures for training and inference. He holds a Ph.D. in Computer Science from the University of Wisconsin-Madison where he worked with Theo Rekatsinas, Shivaram Venkataraman, and Steve Wright. His PhD research focused on the intersection of systems and algorithmic challenges for resource-efficient training of large-scale ML models.

Cost
Free
Accessibility

We value inclusion and access for all participants and are pleased to provide reasonable accommodations for this event. Please call 608-867-6867 or email tomy.1516@gmail.com to make a disability-related accommodation request. Reasonable effort will be made to support your request.

Tags