Uncovering Lit Silicon: Characterization and Insights into LLM Training on AMD GPUs
Event Details
Abstract: Demand for compute is growing exponentially to train bigger and better frontier models. In the face of this massive data center scale-out, efficiency is more important than ever as small gains can translate to hundreds of millions of dollars saved. Characterization is essential to uncover these inefficiencies, which can be challenging in the rapidly evolving landscape of AI.
In this talk, we characterize pre-training LLMs with our tool, Chopper, using single-node Fully-Sharded Data Parallelism (FSDP) as a case study. Our characterization uncovers an unexplored phenomenon we coin Lit Silicon, where slower, straggler GPUs burn compute on faster, leader GPUs by extending communication time. This phenomenon goes beyond LLM training and is important to consider for any HPC workload.
Bio: Marco is a PhD student at the University of Central Florida working under Dr. Di Wu in the Unary Lab. His research focuses on efficiency and scaling of GPU systems for LLM training and inference. He is currently an intern with AMD RAD, studying memory contention for concurrent kernel execution, with the goal of exposing QoS mechanisms to enable better overlap of communication, compute, and multi-tenant workloads.
We value inclusion and access for all participants and are pleased to provide reasonable accommodations for this event. Please call 763-267-1320 (text only) or email bqtran2@wisc.edu to make a disability-related accommodation request. Reasonable effort will be made to support your request.