AI Safety and Theoretical Computer Science, Scott Aaronson (University of Texas at Austin)

Date

Friday, April 4, 2025

Time

11 a.m.-12 p.m.

Location

Description

Abstract:
Progress on AI safety and alignment, like the current AI revolution more generally, has been almost entirely empirical. In this talk,
however, I'll survey a few areas where I think theoretical computer science can contribute to AI safety, including:
- How can we robustly watermark the outputs of Large Language Models and other generative AI systems, to help identify academic cheating, deepfakes, and AI-enabled fraud? I'll explain my proposal and its basic mathematical properties, as well as what remains to be done.
- Can one insert undetectable cryptographic backdoors into neural nets, for good or ill? In what senses can those backdoors also be unremovable? How robust are they against fine-tuning?
- Should we expect neural nets to be "generically" interpretable? I'll discuss a beautiful formalization of that question due to Paul Christiano, along with some initial progress on it, and an unexpected connection to quantum computing.

Cost

Free

Contact

dieter@cs.wisc.edu

Accessibility

We value inclusion and access for all participants and are pleased to provide reasonable accommodations for this event. Please email dieter@cs.wisc.edu to make a disability-related accommodation request. Reasonable effort will be made to support your request.

Calendar

Click a date to see events on that day.

		July
S	M	T	W	T	F	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

AI Safety and Theoretical Computer Science, Scott Aaronson (University of Texas at Austin)

Tags

Calendar

Search

Categories

Browse events by tag

AI Safety and Theoretical Computer Science, Scott Aaronson (University of Texas at Austin)

Event Details

Tags

Calendar

View events by date

Search

Search for events

Categories

Browse events by tag