Department of Computer Science at UH

University of Houston

Department of Computer Science

In Partial Fulfillment of the Requirements for the Degree of
Master of Science

Owais Ahmed

Will defend his thesis


Dataset Modification To Improve Machine Learning Algorithm Performance And Speed

Abstract

We propose two pre-processing steps to classification that apply convex hull-based algorithms to the training set to help improve the performance and speed of classification. The Class Reconstruction algorithm uses a clustering algorithm combined with a convex hull-based approach that re-labels the dataset with a new and expanded class structure. We demonstrate how this performance-improvement algorithm helps improve the accuracy results of Naive Bayes in some, but not all, cases of real-world datasets. The Class Size Reduction approach uses a clustering algorithm as well, followed by collecting all the clusters convex hulls to create a new, smaller dataset. This dataset allows for training a Support Vector Machine much faster. We also demonstrate the improvement in classification speed using this algorithm on several real-world datasets. The improvement in this case is a lot more significant and consistent, with only a few cases where the accuracy dropped. The approaches for both projects are specially applicable to datasets that are characterized by a high number of clusters.

 

Date: Wednesday, April 23, 2014
Time: 9:00 AM
Place: PGH 362

Faculty, students, and the general public are invited.
Advisor: Prof. Ricardo Vilalta