MadSystems Seminar
Demystifying DeepSeek V3 and a Taste of R1
Event Details
This week we will have Minghao Yan give a talk on DeepSeek. Join if you are interested in the system aspect of LLM! Please note that this talk is in CS 1325 (the classroom right next to the main elevator), not the usual meeting room of the MadSystems Seminar.
Abstract: DeepSeek has taken over the world's attention with their state-of-the-art open source model drop reasoning model R1. In this talk, I am going to talk about the innovations behind R1 and the secret hero behind the scene, the DeepSeek V3 model, from both model architecture to system design perspectives. I will present the techniques they develop that enable them to work under hardware constraints and train a frontier model with inferior hardware.
We value inclusion and access for all participants and are pleased to provide reasonable accommodations for this event. Please email chenhaoy@cs.wisc.edu to make a disability-related accommodation request. Reasonable effort will be made to support your request.