Calendar - University of Houston
Skip to main content

[Defense] Named Entity Recognition on Social Media

Monday, January 31, 2022

3:30 pm - 4:30 pm

In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
Shuguang Chen
will defend his proposal
Named Entity Recognition on Social Media


With the increase in popularity of social media platforms (e.g, Twitter, Facebook, and Snapchat), more and more people tend to create, share and exchange information and ideas in such virtual spaces. Accordingly, this raises an increasing demand for more tools and resources to automate the processing of social media text. Specifically, the user-generated text in social media is often very short, which brings new challenges to natural language understanding as it tends to have limited context and ambiguous content. Yet, the prediction performances of live systems degrade over time due to changes in the statistical properties of social media data as language is ever-evolving. Additionally, social media is a low-resource domain, where annotated data is limited and can even quickly become outdated, leading to the problem of temporal drift. Labeling data is always time-consuming and requires domain knowledge and experts. In this proposal, we provide novel methods to reduce performance degradation and improve the robustness of named entity recognition (NER) systems on social media. We propose to study the use of images for NER tasks in terms of image representations and multimodal fusion and explore in which situations incorporating images could benefit NER systems. Moreover, to mitigate temporal drift, we propose an intuitive approach to measure the potential trendiness of social media data and use this metric to select the most informative instances for training and updating NER systems. Besides, we study data augmentation to increase the size of training data in low-resource domains by leveraging data from high-resource domains. Specially, we propose a novel neural architecture to transform the data representation from high-resource to low-resource domains by learning the textual pattern. Additionally, motivated by the linguistic challenges on social media, we also propose to study the model robustness of NER systems under adversarial attacks. The methods presented in the proposal aim to inform the potential avenues of improvement for performing the task of NER on social media text and, as a result, benefit downstream natural language processing tasks such as information extraction, question answering, machine reading comprehension, etc.

Monday, January 31, 2022
3:30 PM - 4:30 PM CT

Dr. Thamar Solorio, dissertation advisor

Faculty, students and the general public are invited.