In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
will defend her dissertation
Content and Stylistic Models for Authorship, Stance and Hyperpartisan Detection
This dissertation presents content and stylistic solutions for three opinion-oriented text classification problems. It explores user-generated text data to find out how individuals write through authorship identification, express their opinion via stance detection, and articulate news while belonging to the left or right political party using hyperpartisan news detection. In the first problem, this research studies the case of deception detection in online reviews. It compares the distribution of structural features of the text using KL-Divergence to find the most discriminative elements of an individual's writing style. Then, it proposes a transductive algorithm to learn from unlabeled test data to expand the training set. Following that, it focuses on authorship verification for document pairs with different topics, genres, or both by presenting a neural network model with parallel recurrent layers and a fusion mechanism that compares the language of the two documents. The model is examined on datasets of multiple domains, including multi-topics multi-genre PAN datasets, Amazon reviews, and a dataset of machine learning articles. Finally, a hierarchical version of the network with two layers of attention is designed for detecting writing style changes within a text document. The model takes the structural features of a sentence to observe the transitions of writing style. Experimental evaluation confirms our previous finding of the effectiveness of structural elements in representing writing style. In the second problem, this research works on identifying the stance of argumentative opinion, a novel application of opinion mining. Its proposed data consists of arguments represented in nonpartisan format. While it is acknowledged that accurate information from both sides of the contemporary issues is an 'antidote in confirmation bias' and such information helps the society to improve critical thinking and open-mindedness, it is relatively rare and hard to find online. With the well-researched non-biased arguments on controversial issues shared by Procon.org, detecting the stance of arguments is a crucial step to automate organizing such resources. To address this, it employs a universal pretrained language model with a weight-dropped LSTM neural network to leverage the context of an argument for finding the argument's stance. The analysis shows the strength of pretraining and the ability of the model to find the stance of long arguments through entire documents using pooling operations. Finally, this dissertation provides an approach to see if the latent personality features in individuals' writing can be useful in the three opinion-oriented classification tasks. The approach deploys the state-of-the-art deep bidirectional transformer to extract Myers-Briggs personality types from user posts in social media. Then, it induces personality information from its proposed transformer-based model to find the effectiveness of such information in authorship verification, stance detection of arguments, and hyperpartisan news detection.
Date: Friday, April 10, 2020
Time: 1:30 - 3:00 PM
Place: Online Presentation - MS Teams Meeting
Advisor: Dr. Arjun Mukherjee
Faculty, students, and the general public are invited.