Computer Science Seminar - University of Houston
Skip to main content

Computer Science Seminar

Multimodal Deep Learning

When: Monday, October 30, 2017
Where: PGH 232
Time: 11:00 AM – Noon

Speaker: Dr. Fabio A. Gonzalez, National University of Colombia

Host: Dr. Thamar Solorio

Multimodality refers to the fact that the same real-world concept can be described by different views or data types. Collaborative encyclopedias, such as Wikipedia, describe a famous person through a mixture of text, images and, in some cases, audio. Users from social networks comment about events like concerts or sport games with small phrases and multimedia attachments. Patient’s medical records are represented by a collection of images, text, sound and other signals. This talk discusses the problem of building machine learning models able to exploit the correlations and complementarities found in multimodal data. Given the popularity and good results exhibited by neural networks with several layers, the talk will mainly focus on multimodal learning with deep neural networks. In addition to a general perspective of the area, the speaker will discuss a new kind of neural network unit, the Gated Multimodal Unit (GMU), which uses multiplicative gates that assign importance to various features simultaneously, creating a rich multimodal representation that does not require manual tuning, but instead learns to combine the different modalities directly from the training data.

Bio:

Fabio A. Gonzalez is a Full Professor at the Department of Computing Systems and Industrial Engineering at the National University of Colombia, where he leads the Machine Learning, Perception and Discovery Lab (MindLab). He earned a Computing Systems Engineer degree and a MSc in Mathematics degree from the National University of Colombia, and a MSc and PhD degrees in Computer Science from the University of Memphis.  His research work revolves around machine learning, information retrieval and computer vision, with a particular focus on the representation, indexing and automatic analysis of multimodal data (data encompassing different types of information: textual, visual, signals, etc.).