Colloquium: Towards Embodied Visual Intelligence
Abstract: What would it mean for a machine to see the world? Computer vision has recently made great progress on problems such as finding categories of objects and scenes, and poses of people in images. However, studying such tasks in isolated disembodied contexts, divorced from the physical source of their images, is insufficient to build intelligent visual agents. My research focuses on remarrying vision to action, by asking: how might vision benefit from the ability to act in the world, and vice versa? Could embodied visual agents teach themselves through interaction and experimentation? Are there actions they might perform to improve their visual perception? Could they exploit vision to perform complex control tasks? In my talk, I will set up the context for these questions, and cover some strands of my work addressing them, proposing approaches for self-supervised learning through proprioception, visual prediction for decomposing complex control tasks, and active perception. Finally, I will discuss my long-term vision and directions that I hope to work on in the next several years.