In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
will defend his dissertation
Density-contour Based Framework for Spatio-temporal Clustering and Event Tracking in Twitter
Due to the advances in remote sensors and sensor networks, different types of spatio-temporal datasets become increasingly available these days. Revealing interesting spatio-temporal patterns from such datasets is very important as it has broad applications, such as understanding climate change, epidemics detection, and human opinions and preferences. The main focus of this research is the development of spatial and spatio-temporal clustering frameworks. In this dissertation, we introduce a density-contour based framework, for spatio-temporal clustering including several novel serial, density-contour based spatio-temporal clustering algorithms: ST-DCONTOUR, ST-DPOLY, ST-COPOT. They all rely on 3-phase clustering approach, which takes the point cloud stream as input and divides it into batches based on fixed size time windows; next, a density estimation approach and contouring algorithms are employed to obtain spatial clusters as polygon models; finally, spatio-temporal clusters are formed by identifying continuing relationships between spatial clusters in consecutive batches. The framework was successfully applied to NYC taxi trips data; the experimental results show that all the algorithms can effectively discover interesting spatio-temporal patterns in taxi pickup location streams. Recently, Twitter, as one of the fastest-growing microblogging services, has induced a lot of research, ; one hot topic is event detection from Tweets. As geotagged tweets can be viewed as location streams with time tags and the content of the tweets themselves, we propose a novel two-stage system to detect and track events from Twitter streams by integrating an LDA-based approach with the density-contour based spatio-temporal clustering approach, we introduced earlier. In the proposed framework, events are identified as topics in tweets using an LDA-based topic discovery step; next, each tweet is assigned an event label; at last, after all locations extracted from each event, the previously mentioned spatio-temporal approach is employed to obtain event clusters and to track the temporal evolution of identified spatial events; in particular, the continuity of events. Through some case studies, we demonstrate the effectiveness of the proposed system as well. In summary, in this work we aim to acquire not only the semantic aspect of the events, but also the geographic distribution of the events and their continuity along time; this information can be used to help individuals, corporations, or government organizations to stay informed of “what is happening now” and to acquire actionable knowledge.
Date: Monday, November 19, 2018
Time: 11:00 AM
Place: PGH 218D
Advisors: Dr. Christoph F. Eick
Faculty, students, and the general public are invited.