Skip to main content

Tilting the BobbyTables and Steering the CensorShip

Dr. David Evans, Professor of Computer Science, University of Virginia

Event Details

Date
Tuesday, November 18, 2025
Time
1 p.m.
Location
Description

Abstract:  AI systems including Large Language Models (LLMs) increasingly influence human writing, thoughts, and actions, yet our ability to measure and control the behavior of these systems is inadequate. In this talk, I will describe some of the risks of uses of language models and ways to measure biases in LLMs. Then, I will advocate for measurement and control strategies that depend on analysis and manipulation of internal representations, and show how a simple inference-time intervention can be used to mitigate gender bias and control model censorship without degrading overall model utility.

Bio:  David Evans (https://www.cs.virginia.edu/evans/) is the Olsen Bicentennial Professor of Engineering and a Professor of Computer Science at the University of Virginia where he leads research on security and privacy with a recent focus on understanding and mitigating risks associated with machine learning. He is the author of an open computer science textbook, a book on secure computation, and a children's book on combinatorics and computability. He was Program Co-Chair for the 24th ACM Conference on Computer and Communications Security (CCS 2017) and the 30th (2009) and 31st (2010) IEEE Symposia on Security and Privacy, where he initiated the Systematization of Knowledge papers. He has SB, SM and PhD degrees in Computer Science from MIT and has been a faculty member at the University of Virginia since 1999.


 

Cost
Free

Tags