Computer Science Focus on Research - University of Houston
Skip to main content

Computer Science Focus on Research

When: Monday, November 5, 2018
Where: PGH 563
Time: 11:00 AM

Unsupervised Deep Learning Recurrent Model for Audio Fingerprinting

Speaker: Abraham Báez Suárez

Audio fingerprinting techniques were developed to index and retrieve audio samples by comparing a content-based compact signature of the audio instead of the entire audio sample, thereby reducing memory and computational expense. Different techniques have been applied to create audio fingerprints, however, with the introduction of deep learning, new data-driven unsupervised approaches are available. A Sequence-to-Sequence Autoencoder Model for Audio Fingerprinting (SAMAF) improved hash generation through a novel loss function composed of terms: Mean Square Error (MSE), which minimizes the reconstruction error; Hash Loss, which minimizes the distance between similar hashes and encourages clustering, and Bitwise Entropy Loss, which minimizes the variation inside the clusters. The performance of the model was assessed with two types of audio signals, speech, and music, making use of three datasets: the English Language Speech Database for Speaker Recognition (ELSDSR), a set of 9-1-1 emergency calls, and a Music dataset. Furthermore, the model was compared against two baselines: Random, the SAMAF architecture initialized with random weights; and Dejavu, a Shazam-like algorithm. Extensive empirical evidence showed that our approach outperformed the Shazam-like algorithm in the audio identification and Music Information Retrieval tasks with an economical hash size of either 128 or 256 bits per second of audio. Additionally, the developed technology was deployed into two 9-1-1 Emergency Operation Centers (EOCs), located in Palm Beach County (PBC) and Greater Harris County (GH), allowing the evaluation of the system in real-time, and in an industrial environment.


Abraham Báez Suárez earned the Bachelor in Mechatronics Engineer from the Instituto Tecnólogico y de Estudios Superiores de Monterrey (ITESM), Monterrey Campus in May 2009. While doing his bachelor, he had the opportunity to be involved in an exchange program with the University of Fachhochschule Esslingen working in a multidisciplinary project where he was in charge of the design of a scale wind turbine collecting data for the best cost-efficiency curve related to the material. After his bachelor, he was accepted in the graduate program in Intelligent Systems and got awarded with the Master’s title in May 2011. His thesis was on the application of Genetic Algorithms for damage localization in composite structures. After the graduate program, he started working for a multinational company leader in the electro domestics area (Whirlpool) designing mechanical parts for the ice & water system, and the harnesses of the fridge units. He started working at ITESM during the summer of 2013 in a research assistant position and a few months later he joined the Doctorate program pursuing the Computer Science degree. During his Doctorate, he had the opportunity to visit the University of Houston as a Research Scholar. Arriving on August 2014, he has been conducting research in the areas of Computer Vision and Speech Processing; he will be graduating from his Doctorate program in December 2018. His research interests include Artificial Intelligence, Machine Learning, and Deep Learning with applications to security and biometrics.

Training Deep Semantic Models: an Adversarial Transfer Learning Approach

Speaker: Dainis Boumber

The NLP research community has been facing a major challenge as of late; however, it appears that it has not often discussed much in public until very recently, perhaps due to it's somewhat embarrassing nature. The problem we are talking about is the fact that while being very much intertwined with Machine Learning, NLP research has not seen as much progress in terms of Deep Learning as ML, Computer Vision, AI, Robotics, or other fields. In fact until very recently what has been referred to as "deep models" really meant Neural Networks of various kinds; however, a CNN with 2 layers is not deep just by virtue of being a CNN. Throughout the past year, we worked on a number of such problems and developed a straightforward approach which utilizes transfer learning, generative adversarial models and adversarial discriminators, as well as a number of other tricks to make deep learning viable even when data is highly dimensional, unstructured, and lacking in volume. Simultaneously with our work, a number of publications on deep pre-trained language models appeared independent from one another, which also make deep learning in NLP a much less daunting task, albeit through different means. In this presentation, we introduce a method that relies on a language model to construct a classifier for an authorship verification problem where number of data samples is equal to number of classes, the samples belong to different domains, and are not structured in any way we can rely on. We demonstrate a robust solution to these problem types through the use of language models coupled with transfer learning, domain adaptation by way of mapping of samples into a common decision space while pushing distinct classes further apart, generation of synthetic training samples via a generative adversary, and provide a bag full of regularization tricks that worked for us. Finally, we briefly discuss experimental results used to validate the premises of our approach and compare them with what we know to be current state of the art.


I am a data scientist working for a UK-based startup called SnapRapid. My primary interests are Machine Learning and Natural Language Processing with a focus on Deep Learning and Domain Adaptation. I have 5 years of software engineering experience, one year of managing a team of developers & researchers, 4 more years in data science and 1 year experience as a CTO. I am graduating with a PhD in CS (Machine Learning) in Fall 2018; my thesis is "Multi-Domain Adaptation and Generalization using Deep Adversarial Models". I am an active researcher, with papers accepted to and under review in top-5 conferences in NLP.

Folksonomication: Predicting Tags for Movies from Plot Synopses Using Emotion Flow Encoded Neural Network

Speaker: Sudipta Kar

Folksonomy of movies covers a wide range of heterogeneous information about movies, like the genre, plot structure, visual experiences, soundtracks, metadata, and emotional experiences from watching a movie. Being able to automatically generate or predict tags for movies can help recommendation engines improve retrieval of similar movies, and help viewers know what to expect from a movie in advance. In this work, we explore the problem of creating tags for movies from plot synopses. We propose a novel neural network model that merges information from synopses and emotion flows throughout the plots to predict a set of tags for movies. We compare our system with multiple baselines and found that the addition of emotion flows boosts the performance of the network by learning ≈18% more tags than a traditional machine learning system.


Sudipta Kar is a fourth year PhD student of Computer Science at the University of Houston. He is advised by Dr. Thamar Solorio and works in the RiTUAL lab. His main research interest lies within deep learning for natural language processing problems like stylistic analysis of texts and emotion flow modeling in narratives. He received his Bachelor’s degree in Computer Science and Engineering from Shahjalal University of Science and Technology (SUST), Bangladesh.