In Partial Fulfillment of the Requirements for the Degree of Master of Science
will defend her thesis
Solving Connectivity, Diameter and Betweenness Centrality of Big Graphs Using Relational Queries
Within big data analytics, graph problems are as important as machine learning. There exist many algorithms to analyze large graphs but they are limited by main memory. On the other hand, a lot of data stored on DBMSs that needs to be analyzed as graphs. Moreover, DBMSs can work in parallel and they do not have RAM limitations. In this paper, we propose several algorithms that produce metrics and show properties of the graph as well as help us to understand the graph structure specifically connectedness, counting triangles, diameter and betweenness centrality. We propose optimized SQL queries that work on a graph stored in relational form as triples can compute these in a more flexible and efficient manner. We study how to optimize such SQL queries combining demanding joins and aggregations that remove main memory limitation and also they can work in parallel. Finally, we provide an experimental evaluation to understand accuracy and performance. We compare our algorithms with popular platforms including Python and Spark. We experimentally show our SQL algorithms are accurate and efficient.
Date: Wednesday, April 10, 2019
Time: 10:00 AM
Place: PGH 218D
Advisors: Dr. Carlos Ordonez
Faculty, students, and the general public are invited.