Skip to main content

ML+X Forum: Trustworthy LLMs and Ethical AI

Machine Learning Community Event

Event Details

Tuesday, May 14, 2024
12-1 p.m.

Join the ML+X community on Tuesday, May 14th, from 12-1 pm, for our last forum this spring! Next week, we will hear about Rheeya Uppaal’s method for denoising toxic embeddings in large language models (LLMs) and Mariah A. Knowles’s investigation of narratives surrounding ethical AI practices. Register by May 12th (lunch provided) to guarantee your lunch ticket and join the discussion!

  1. DeTox: Reducing Model Toxicity through Weight Editing, Rheeya Uppaal
    Recent leaps in the world of large language models (LLMs) have led to increased capabilities, as well as an increased need to ensure these models are safe and trustworthy. Current approaches toward this involve training a model using preference data, teaching the model to align itself to the style it is showcased. However, this process can be expensive in data and computation. Model editing is a recent class of approaches which aim to alter the weights or hidden representations of a model, without any actual training. In this spirit, we present DeTox (Denoise Toxic Embeddings), an editing approach that reduces the level of toxicity in the text generated by the model. DeTox achieves this at a fraction of the cost of traditional fine-tuning. In this talk, we will walk through a bird’s eye view of the key intuitions behind model editing, motivating our approach. We will then discuss our approach, going through the benefits and challenges associated with DeTox and model editing in general.
  2. A Project on AI Ethics, Mariah A. Knowles
    We are in a period of AI social change, and if recommendations for AI practitioners are going to have an effect, they must appeal to practitioners' real internal motivations and experiences. That in mind, I study social change, moral aspirations, and narratives in AI Ethics. My research aims to understand the experiences of social change and becoming “good” as an AI practitioner, as well as the moral aspirations at play in these experiences. I collect data through semi-structured interviews, analyze it through thematic analysis, and the narrative turns of my account of the data are informed by interpretivist mixed methods. In this talk I present results from the first phase of the study, including the emerging qualitative insights after reading through the data and quantitative methods used to account for heterogeneity between media types that my analytic memos were recorded in.

Finding the Orchard View room: The Orchard View room is located on the 3rd floor of Discovery Building — room 3280. To get to the third floor, take the elevator located next to the Aldo’s Cafe kitchen (see photo). If you cannot attend in-person, we invite you to stream the event via Zoom: ( Passcode: 111195).

Join the ML+X google group: The ML+X community has a google group it uses to send reminders about its upcoming events. If you aren't already a member of the google group, you can use this link to join. Note that you have to be signed into a google account to join the group. If you have any trouble joining, please email

Have an ML project or project proposal you want to share with the community? Presenting at ML+X is an excellent opportunity to get feedback on your methods and connect with other ML practitioners across Madison. If interested, please fill out this google form indicating which date(s) work for your schedule. If you have any questions about presenting, please email Chris Endemann (