Dissertation Proposal
In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
Kinjal Dhar Gupta
will defend his dissertation proposal
Robust Domain Adaptation Using Active Learning
Abstract
Traditional machine learning algorithms assume that the training and the test datasets are generated from the same underlying distribution, which is not true for most real-world datasets. As a result, a model trained on the training dataset fails to produce good classification accuracy on the test dataset. In additio n, it is often seen that labeling a new test dataset is very expensive. One way to mitigate this problem is using domain adaptation techniques that build a new model on the unlabeled test dataset, also called the target dataset, by transferring information from a related but labeled training dataset, also called the source dataset, even when their underlying distributions are different. However, there is no allowance for obtaining the class labels of the test dataset during the training phase in domain adaptation. This issue can be handled by active learning techniques. Active learning assumes that some instances of the test dataset can be labeled by the expert with a minimal cost. The goal of active learning is to find the most informative instances of the test dataset that are to be labeled by the expert to get a better classification accuracy on the unlabeled test dataset.
The basic assumption in domain adaptation is that there exists an unknown model that can classify both the source and target datasets perfectly in the same instance space or in some projected space. This assumption may not hold for many datasets. On the other hand, most active learning methods suffer from sampling bias and the model generated on the target dataset deviates from the optimal model as it progresses through iterations. The goal of my research is to propose a novel technique that mitigates both of these problems by using active learning methods in domain adaptation.
The method that I propo se for my research has two parts. The first part is a novel domain adaptation technique to align the source and the target datasets such that the model built on source dataset can be directly used on the target dataset when there is a shift in the priors of the distributions between the datasets. The second part introduces a new active learning method with domain adaptation that transfers the related information from source domain to target domain to build a model on the target dataset with improved accuracy by utilizing the cost of labeling in an efficient way.
Date: Friday, March 4, 2016
Time: 1:00 PM
Place: HBS 352
Advisor: Dr. Ricardo Vilalta
Faculty, students, and the general public are invited.