Tilting the BobbyTables and Steering the CensorShip

Dr. David Evans, Professor of Computer Science, University of Virginia

Date

Tuesday, November 18, 2025

Time

1 p.m.

Location

Description

Abstract: AI systems including Large Language Models (LLMs) increasingly influence human writing, thoughts, and actions, yet our ability to measure and control the behavior of these systems is inadequate. In this talk, I will describe some of the risks of uses of language models and ways to measure biases in LLMs. Then, I will advocate for measurement and control strategies that depend on analysis and manipulation of internal representations, and show how a simple inference-time intervention can be used to mitigate gender bias and control model censorship without degrading overall model utility.

Bio: David Evans (https://www.cs.virginia.edu/evans/) is the Olsen Bicentennial Professor of Engineering and a Professor of Computer Science at the University of Virginia where he leads research on security and privacy with a recent focus on understanding and mitigating risks associated with machine learning. He is the author of an open computer science textbook, a book on secure computation, and a children's book on combinatorics and computability. He was Program Co-Chair for the 24th ACM Conference on Computer and Communications Security (CCS 2017) and the 30th (2009) and 31st (2010) IEEE Symposia on Security and Privacy, where he initiated the Systematization of Knowledge papers. He has SB, SM and PhD degrees in Computer Science from MIT and has been a faculty member at the University of Virginia since 1999.

Cost

Free

Contact

608-206-6063, mschreier@wisc.edu

Calendar

Click a date to see events on that day.

		March
S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Tilting the BobbyTables and Steering the CensorShip

Tags

Calendar

Search

Categories

Browse events by tag

Tilting the BobbyTables and Steering the CensorShip

Event Details

Tags

Calendar

View events by date

Search

Search for events

Categories

Browse events by tag