Knowledge Discovery: Making Sense of Big Data

Implementing Algorithms for Data Analytics

In an age where we are buried in information, how do we make sense of it?

Graduate student in computer science, Wellington Cabrera (Ph.D., ’17) worked on implementing algorithms for data mining and analytics.One of technology’s many advantages is the ease with which we can collect information on everything from internet browsing patterns to activity levels recorded by wearable trackers to weather patterns, all of which allow for more accurate insights. This ease of data collection, however, has its own drawback, as the sheer amount can be overwhelming.

Data Mining: Teasing Out Hidden Patterns

While working on graduate degree in computer science, Wellington Cabrera (Ph.D., ’17), sought to address this problem by creating and implementing algorithms for data analytics and mining in database systems. This research was performed under the guidance of Carlos Ordonez, associate professor of computer science in the College of Natural Sciences and Mathematics.

In this deluge of information, with all of its tangled implications, data mining works to tease out the hidden patterns.

“Another name for data mining is ‘knowledge discovery,’” Cabrera noted.

Parallel Computing Increases Speed and Storage Capabilities

Cabrera’s research was to develop algorithms that could work for parallel database systems. Often, database systems are distributed across multiple computers, a strategy termed parallel computing. Although this increases a database’s speed and storage capabilities, this also requires an adjustment in how tasks are performed.

“Developing an algorithm for a single computer requires a lot of sequential steps,” Cabrera said. “For parallel systems, the challenge is getting these multiple computers to work together to solve problems.”

Scalable Algorithms

Cabrera focused on scalable algorithms, in order to get comparable performance regardless of a database’s size.

“You want an algorithm that can work just as well with two computers as it does with 1,000,” Cabrera said. “When you have many computers working together, you tend to see a degradation. If the algorithm does not coordinate the parallel processing correctly, then the computers cannot work together in the right manner, becoming a mess.”

Overcoming Challenges

During his time as a graduate student, Cabrera faced many of the typical challenges of juggling coursework, research and his responsibilities as a teaching assistant, all while trying to plan for the next
step in his career.

“To get a Ph.D., you have to overcome many obstacles,” Cabrera said.

This hard work ultimately paid off, as Cabrera landed several internships, published numerous papers in well-respected journals, and, after graduation, was offered a job in the tech industry.

“I am very stubborn,” Cabrera said. “I don’t like to give up.”

- Rachel Fairbank, College of Natural Sciences and Mathematics

November 30, 2017