Computer Science Machine Learning Lunch Meetings

Large Multimodal (Vision-Language) Models for Image Generation and Understanding

Date

Thursday, October 5, 2023

Time

12-1 p.m.

Location

1325 Computer Sciences

Description

Speaker: Yong Jae Lee

Abstract: Large Language Models and Large Vision Models, also known as Foundation Models, have led to unprecedented advances in language understanding, visual understanding, and AI. In particular, many computer vision problems including image classification, object detection, and image generation have benefited from the capabilities of such models trained on internet-scale text and visual data. In this talk, I'll present our recent work on Large Multimodal (Vision-Language) Models (LMMs) for controllable image generation (GLIGEN) and language-and-vision chatbot assistance (LLaVA). Since training foundation models from scratch can be prohibitively expensive, a key challenge is how to efficiently and effectively adapt and repurpose them to downstream tasks of interest. I'll provide key insights on how we achieve this, the models' inner workings, and discuss their limitations and future directions.

Cost

Free

Calendar

Click a date to see events on that day.

		July
S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Computer Science Machine Learning Lunch Meetings

Tags

Calendar

Search

Categories

Browse events by tag

Computer Science Machine Learning Lunch Meetings

Event Details

Tags

Calendar

View events by date

Search

Search for events

Categories

Browse events by tag