In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
will defend her dissertation
Feature Selection in Classification Tasks
AbstractFeature subset selection is an essential pre-processing task in machine learning and pattern recognition. In supervised learning, the goal of feature selection is to select the smallest subset of features that can predict the target class with high generalization performance e.g., with high accuracy. Moreover, feature selection can also help to avoid over-fitting, reduce computational costs, and shorten training time. In this study, we introduce a new filter-based feature selection algorithm named EBFS, which is inspired by Relieff, a well-known distance-based method. Like Relieff, we use the nearest neighborhood of each instance to find the local weight of each feature. Unlike Relieff --and other filter-based methods that focus on mutual information, conditional mutual information, or interaction information-- this study looks at the entropy of feature values in local neighborhoods, effectively capturing information not used in previous distance-based methods. We introduce a new heterogeneous ensemble method based on Relieff that varies the size of the neighborhood and uses statistical hypothesis tests to find the number of relevant features. Both algorithms are tested on multiple data-sets; results show the effectiveness of our approach when compared to other well-known methods.
Date: Friday, November 15, 2019
Time: 3:00 - 4:00 PM
Place: MREB 222
Advisor: Dr. Ricardo Vilalta
Faculty, students, and the general public are invited.