Computer Science Seminar

Generating Natural-Language Descriptions of Videos in the Wild

When: Friday, November 6, 2015
Where: PGH 232
Time: 11:00 AM

Speaker: Prof. Raymond Mooney, University of Texas-Austin

Host: Prof. Thamar Solorio

We present a novel methods for automatically generating English sentences describing short videos by combining techniques from computer vision and natural-language processing. We first use state-of-the-art visual object and activity recognizers to determine a potential set of entities and actions in the video. We then use statistics mined from large parsed corpora of English to determine the most probable subject-verb-object-scene tuple, which is then used to generate a descriptive English sentence. We also recently applied Long Short-Term Memory (LSTM) deep recurrent neural networks to directly translate videos to English descriptions, obtaining even better results. Experimental evaluation on a corpus of short YouTube videos and movie clips annotated by Descriptive Video Service demonstrate the capabilities of these various techniques by comparing their output to human-generated descriptions.

Bio:

Raymond J. Mooney is a Professor in the Department of Computer Science at the University of Texas at Austin. He received his Ph.D. in 1988 from the University of Illinois at Urbana/Champaign. He is an author of over 150 published research papers, primarily in the areas of machine learning and natural language processing. He was the President of the International Machine Learning Society from 2008-2011, program co-chair for AAAI 2006, general chair for HLT-EMNLP 2005, and co-chair for ICML 1990. He is a Fellow of the American Association for Artificial Intelligence, the Association for Computing Machinery, and the Association for Computational Linguistics and the recipient of best paper awards from AAAI-96, KDD-04, ICML- 05 and ACL-07.

Events

Generating Natural-Language Descriptions of Videos in the Wild