In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
Mohammad Tanvir Rahman
will defend his dissertation
Performance Models for Parallel Applications Under Failures
Due to the growing size of compute clusters, large scale parallel applications increasingly have to deal with hardware malfunctions and other failure scenarios during execution. The overall goal of this research is to get good performance of parallel applications despite failures. This dissertation introduces two mathematical models to improve resilience of parallel applications on two different frameworks. The first one is a mathematical model to minimize job completion time for inter-dependent parallel processes running in a volunteer environment by finding the optimal checkpoint interval. Validation is performed with a sample real world application running on a pool of distributed volunteer nodes. The results shows that the predicted checkpoint interval gives performance closed to optimal checkpoint interval determined empirically after extensive experimentation.
The second part of the dissertation evaluates the performance of Hadoop MapReduce applications, with different execution parameters and under different failure scenarios. The dissertation introduces performance models for Hadoop MapReduce applications considering node and process failures. Having a performance model allows to determine optimal settings for some of the parameters, such as split size. Validation of the model is done by running two MapReduce applications with different parameter settings. The results show that different applications require different settings for the same MapReduce parameters and the proposed model can predict the performance very well.
Date: Wednesday, November 29, 2017
Time: 3:00 PM
Place: PGH 501D
Advisor: Dr. Edgar Gabriel, Dr. Jaspal Subhlok
Faculty, students, and the general public are invited.