Thesis Defense - University of Houston
Skip to main content

Thesis Defense

In Partial Fulfillment of the Requirements for the Degree of Master of Science

Manvi Saxena

will defend her thesis

Evaluation of mpi4py for Natural Language Processing Scenarios


Abstract

Many Natural Language Processing (NLP) applications operating on large data sets are written in programming languages that do not have bindings in the Message Passing Interface (MPI) specification. Yet, with increasing problem sizes, these applications also necessitate some form of parallel and distributed processing. The goal of this thesis is to evaluate the utilization of MPI with a non-traditional HPC programing language, Python, for NLP application scenarios. The current thesis is divided into two parts. The first part evaluates the performance and functionality of the mpi4py, a python module for MPI binding, using multiple point-to-point benchmarks with native C-based MPI benchmarks using an InfiniBand and a Gigabit Ethernet network interconnect. The results show that in many instances communication performance of the Python benchmarks was on par with their C-based counterparts. In the second part of the thesis, a few application scenarios used in Natural Language Processing (NLP) such as word count, n-gram count, and tfidf were developed, and mpi4py module was used to distribute data on different nodes for these scenarios and to evaluate performance. The results demonstrate that the application of mpi4py module in NLP scenarios can greatly improve execution time.


Date: Friday, April 27, 2018
Time: 2:00 PM
Place: PGH 501D
Advisor: Prof. Edgar Gabriel

Faculty, students, and the general public are invited.