Skip to main content

Machine Learning Lunch Meeting

Poisoning (Discriminative) Language Models with Character-level attacks

Event Details

Date
Tuesday, February 18, 2025
Time
1-2 p.m.
Location
Description

General: MLLM is a cross-discipline weekly seminar open to all where UW-Madison professors present their research in machine learning, both theory and applications.  The goal is to promote the great work at UW-Madison and stimulate collaborations. Please see our website for more information.

Speaker: Grigoris Chrysos (ECE)

Abstract: Adversarial attacks in natural language processing have historically drawn parallels to those in computer vision, with token-level perturbations dominating methodological paradigms due to their compatibility with gradient-based optimization. However, while effective, token-level attacks often introduce semantically meaningful alterations - such as synonym substitutions or paraphrasing - that may distort the original input’s intent. In contrast, character-level perturbations, which modify individual characters (e.g., typos), offer a minimally invasive alternative that preserves semantic fidelity. In this work, we study Charmer, a new character-level adversarial attack framework that achieves high attack success rates (ASR) with few character modifications per input. Charmer performs well on both discriminative and generative language models, demonstrating its versatility even in systems fortified with adversarial training. In addition, we investigate the interplay between character-level attacks and automated typographical correctors and demonstrate an interesting trade-off. 

Cost
Free

Tags