Computer Science Grad Student Wins Award at Major International Conference

Shirani Receives “Highly Commended Paper Award” at 2018 ESEM Conference

Amirreza (Reza) Shirani, a University of Houston Ph.D. student in computer science, received a “Highly Commended Paper Award” at the 2018 International Symposium on Empirical Software Engineering and Measurement (ESEM), held October 11–12 in Oulu, Finland.

Amirreza (Reza) Shirani
Shirani, a Ph.D. student in computer science, uses natural language processing to classify questions based on similarity.

His research uses a branch of artificial intelligence called natural language processing to sort and classify questions relating to software development, and was conducted in collaboration with Amin Alipour, assistant professor in computer science in the College of Natural Sciences and Mathematics. Co-authors also include Bowen Xu and David Lo from Singapore Management University.

Community-Based Forums for Software Developers

“Software developers need to work with numerous programming algorithms,” said Shirani, who started his Ph.D. in the fall of 2016, with his thesis research conducted under the direction of Thamar Solorio, associate professor of computer science. “Knowledge relating to these problems is often dispersed among many books and user manuals.”

This dispersal of information, along with the fast pace of software development, causes many developers to rely on community forums for help. One of the big resources is a website called Stack Overflow, where developers ask and answer questions related to their work.

The speed and frequency at which users post questions is in the range of hundreds per hour. Many of these posts are developers with similar questions. Given the large number of questions asked per hour, sorting through these questions to find an answer can be a difficult task.

Classifying Questions Based on Similarity

Shirani, using his skills in natural language processing, has been able to come up with algorithms for classifying questions based on their similarity. In this system, questions are classified according to whether they are duplicates, directly linked, indirectly linked, or independent of each other.

“The big challenge is that these are questions with many technical words in them,” Shirani said. “That makes it hard even for people to see these questions are similar.”

What Shirani has found is that efficient classification takes into account the meaning of words contained within a question, rather than the context.

“People ask questions in different ways,” Shirani said. “Some will give a brief description, while others will give a piece of code.”

To predict the degree of similarity between questions, Shirani uses machine learning, a branch of artificial intelligence which provides computers with the ability to learn and improve without being explicitly programmed.

The algorithms developed by Shirani and his collaborators offer a faster, more efficient approach, one which solves an everyday problem faced by software developers the world over.

“This is a fast-moving field, with a lot of potential applications,” Shirani said.

- Rachel Fairbank, College of Natural Sciences and Mathematics