Talk: Learning General Language Processing Agents
Dani Yogatama: Staff Research Scientist at DeepMind, PhD Carnegie Mellon University
Event Details
Also offered online
Abstract: The ability to continuously learn and generalize to new problems quickly is a hallmark of general intelligence. Existing machine learning models work well when optimized for a particular benchmark, but they require many in-domain training examples (i.e., input-output pairs that are often costly to annotate), overfit to the idiosyncrasies of the benchmark, and do not generalize to out-of-domain examples. In contrast, humans are able to accumulate task-agnostic knowledge from multiple modalities to facilitate faster learning of new skills.
In this talk, I will argue that obtaining such an ability for a language model requires significant advances in how we acquire, represent, and store knowledge in artificial systems. I will present two approaches in this direction: (i) an information theoretic framework that unifies several representation learning methods used in many domains (e.g., natural language processing, computer vision, audio processing) and allows principled constructions of new training objectives to learn better language representations; and (ii) a language model architecture that separates computation (information processing) in a large neural network and memory storage in a key-value database. I will conclude by briefly discussing a series of future research programs toward building a general linguistically intelligent agent.
Bio: Dani Yogatama is a staff research scientist at DeepMind. His research interests are in machine learning and natural language processing. He received his PhD from Carnegie Mellon University in 2015. He grew up in Indonesia and was a Monbukagakusho scholar in Japan prior to studying at CMU.