In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
will defend his dissertation
Abstraction of Computation and Data Motion in High-performance Computing Systems
As scientific frameworks become sophisticated, so do their data structures. A data structure typically includes pointers and arrays to other structures in order to preserve the application's state. The complexity increases as one must go through a chain of pointers to extract the effective address that we are targeting in our source code. We have proposed to reduce the need for excessive data transfer by introducing pointerchain, a directive that replaces a chain of pointers with their corresponding effective address inside the parallel region of code. Excessive use of pointers in scientific applications, in some cases, prevents parallelizing them. With the help of pointerchain on top of the OpenACC programming model, we were able to parallelize a molecular dynamics proxy application for a heterogeneous system. The abstraction provided by pointerchain prevents the source code of the proxy application to undergo significant changes. Based on our analysis, pointerchain leads to a 39% and 38% reduction in the amount of generated codes and the total executed instructions, respectively. Moreover, as hardware gets complicated to improve performance and to battle "memory wall" phenomenon, the applications (basically their source codes) become complex as well. The Top500 reports that eighty-six systems in the list are heterogeneous systems configured with a form of an accelerator and a coprocessor. From a software standpoint, managing data locality on such systems is as important as exploiting parallelism in order to achieve the best performance. With the advent of novel memory technologies, such as non-volatile memory (NVM) and 3D-stacked memory, there is an urgent need for novelties within programming models to create an easy-to-use interface that abstracts the complexity of memory hierarchies in order to utilize data locality. The proposed novel programming model proposed in this dissertation, Gecko, abstracts the underlying memory hierarchy for current and future platforms in a hierarchical manner. Gecko's directives distribute data and computation among devices of different types in a system, resulting in an adaptive application. The evaluations of Gecko reveal that on a single node consisting of four NVIDIA Volta V100 GPUs, we observe 3.3 times speedup with multiple GPUs.
Date: Thursday, August 29, 2019
Time: 11:00 AM - 12:00 PM
Place: PGH 550
Advisor: Dr. Margaret S. Cheung
Faculty, students, and the general public are invited.