In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
will defend his dissertation proposal
Abstraction of Computation and Data Motion in HPC Systems
As scientific frameworks become sophisticated, so do their data structures. A data structure typically includes pointers and arrays to other structures in order to preserve application’s state. The complexity increases as one must go through a chain of pointers to extract the effective address that we are targeting in our source code. We have proposed to reduce the need of excessive data transfer by introducing `pointerchain`, a directive that replaces a chain of pointers with their corresponding effective address inside the parallel region of a code. Excessive use of pointers in scientific applications, in some cases, prevents parallelizing them. With the help of `pointerchain` on top of the OpenACC programming model, we were able to parallelize a Molecular Dynamics proxy application for a heterogeneous system. The abstraction provided by `pointerchain` prevents the source code of the proxy application to undergo significant changes. Based on our analysis, `pointerchain` leads to a 39% and 38% reduction in the amount of generated codes and the total executed instructions, respectively. Moreover, as hardware gets complicated in order to improve performance and to battle “memory wall” phenomenon, the applications (basically their source codes) become complex as well. Top500 reports that eighty-six systems in the list are heterogeneous systems configured with a form of accelerator and coprocessor. From a software standpoint, managing data locality on such systems is as important as exploiting parallelism in order to achieve the best performance. With the advent of novel memory technologies, such as non-volatile memory (NVM) and 3D-stacked memory, there is an urgent need for novelties within programming models to create an easy-to-use interface that abstracts the complexity of memory hierarchies in order to utilize data locality. We propose Gecko, a novel programming model that abstracts the underlying memory hierarchy for current and future platforms in a hierarchical manner. The Gecko’s directives distribute data and computation among devices of different types in a system, resulting in an adaptive application. Our evaluations reveal that on a single node consisting of four NVIDIA Volta V100 GPUs, we observe 3.3 times speedup with multiple GPUs.
Date: Wednesday, October 31, 2018
Time: 11:00 AM
Place: PGH 362
Advisors: Dr. Margaret S. Cheung
Faculty, students, and the general public are invited.