Machine Learning Lunch Meeting
Poisoning (Discriminative) Language Models with Character-level attacks
Event Details
General: MLLM is a cross-discipline weekly seminar open to all where UW-Madison professors present their research in machine learning, both theory and applications. The goal is to promote the great work at UW-Madison and stimulate collaborations. Please see our website for more information.
Speaker: Grigoris Chrysos (ECE)
Abstract: Adversarial attacks in natural language processing have historically drawn parallels to those in computer vision, with token-level perturbations dominating methodological paradigms due to their compatibility with gradient-based optimization. However, while effective, token-level attacks often introduce semantically meaningful alterations - such as synonym substitutions or paraphrasing - that may distort the original input’s intent. In contrast, character-level perturbations, which modify individual characters (e.g., typos), offer a minimally invasive alternative that preserves semantic fidelity. In this work, we study Charmer, a new character-level adversarial attack framework that achieves high attack success rates (ASR) with few character modifications per input. Charmer performs well on both discriminative and generative language models, demonstrating its versatility even in systems fortified with adversarial training. In addition, we investigate the interplay between character-level attacks and automated typographical correctors and demonstrate an interesting trade-off.