Machine Learning Lunch Meeting
Show Me the Funny: LLM’s Epic Failure and the Road to Winning in the New Yorker Cartoon Caption Contest
Event Details
Everyone is invited to the weekly Machine Learning Lunch Meetings held Fridays 12:30-1:30pm. Faculty members from Computer Sciences, Statistics, ECE, and other departments will discuss their latest groundbreaking research in machine learning. This is an opportunity to network with faculty and fellow researchers, and to learn about the cutting-edge research being conducted at our university. Please see https://sites.google.com/view/wiscmllm/home for the schedule and more information.
Speaker: Robert Nowak (ECE) and Jifan Zhang (CS)
Abstract: In this talk, we explore the humor generation capabilities of large language models (LLMs), specifically focusing on The New Yorker's Cartoon Caption Contest. The presentation is structured in two parts:
- Results from our recent NeurIPS spotlight paper, introducing a novel dataset and benchmark for caption generation. Our findings reveal that state-of-the-art LLMs significantly underperform top human-written captions in this task.
- An analysis of the unique challenges posed by The New Yorker's distinct style of humor, and the research questions we must address to compete successfully in the contest.
We present our multimodal preference dataset, comprising over 250 million human ratings on more than 2.2 million captions, collected from The New Yorker's weekly cartoon caption contest over eight years. We establish new benchmarks for evaluating model-generated captions, employing both GPT-4 and human judgments to develop ranking-based evaluation methodologies. Our research highlights the limitations of current fine-tuning techniques, such as RLHF and DPO, when applied to creative tasks. Moreover, we demonstrate that even cutting-edge models like GPT-4 and Claude currently fall short of top human contestants in generating humorous captions.
The latter half of the talk delves into the key elements for success in the caption contest, ranging from the fundamental challenge of understanding humor to the diverse strategies employed in winning captions. We analyze the failure modes of state-of-the-art LLMs through the lens of Kahneman's dual-process theory, categorizing challenges into System 1 and System 2 thinking. To illustrate effective caption generation, we showcase examples and recipes from Larry Wood, a four-time winner of the New Yorker Cartoon Caption Contest. We conclude by presenting open challenges in humorous caption generation, inviting further research and discussion in this field.