MATH 4323 - Data Science and Statistical Learning

***This is a course guideline. Students should contact instructor for the updated information on current course syllabus, textbooks, and course content*

Prerequisites: MATH 3339 or MATH 3349

Course Description: Theory and applications for such statistical learning techniques as maximal marginal classifiers, support vector machines, K-means and hierarchical clustering. Other topics might include: algorithm performance evaluation, cluster validation, data scaling, resampling methods. R Statistical programming will be used throughout the course.

Textbook: While lecture notes will serve as the main source of material for the course, the following book constitutes a great reference:

”An Introduction to Statistical Learning (with applications in R)” by James, Witten et al. ISBN: 978-1461471370

Learning Objectives: By the end of the course a successful student should:

Have a solid conceptual grasp on the described statistical learning methods.
Be able to correctly identify the appropriate techniques to deal with particular data sets.
Have a working knowledge of R programming software in order to apply those techniques and subse- quently assess the quality of fitted models.
Demonstrate the ability to clearly communicate the results of applying selected statistical learning methods to the data.

Software: Make sure to download R and RStudio (which can’t be installed without R) before the course starts. Use the link https://www.rstudio.com/products/rstudio/download/ to download it from the mirror appropriate for your platform. Let me know via email in case you encounter difficulties.

Tentative Course Outline:

Review: Task of Statistical Learning. Supervised and unsupervised learning. Most ubiquitous statistical learning techniques.
Support Vector Classifier. Maximal margin classifier: separating hyperplane, support vectors. Non-separable case: support vector classifier.
Support Vector Machines. Non-linear decision boundaries. Kernels. One-versus-one and one-vs-all classification for K > 2 classes. Evaluating quality of classification.
Clustering Methods: K-Means. Within-cluster variation. Computing centroids. Multiple starts. Selecting K.
Clustering Methods: Hierarchical. Agglomerative clustering. Linkage. Interpreting dendrogram. Choice of dissimilarity measure. Data scaling.
Evaluation of Clustering Solution. Is this a good clustering? Variance explained. Between- and within-cluster variation. Silhouette coefficient.

Grading: Please consult your instructor's syllabus regarding any and all grading guidelines.

Justin Dart Jr. Center Accommodations:

Academic Adjustments/Auxiliary Aids: The University of Houston System complies with Section 504 of the Rehabilitation Act of 1973 and the Americans with Disabilities Act of 1990, pertaining to the provision of reasonable academic adjustments/auxiliary aids for students who have a disability. In accordance with Section 504 and ADA guidelines, University of Houston strives to provide reasonable academic adjustments/auxiliary aids to students who request and require them. If you believe that you have a disability requiring an academic adjustments/auxiliary aid, please visit Justin Dart Jr. Student Accessibility Center website at https://www.uh.edu/accessibility/ for more information.

UH CAPS

Counseling and Psychological Services (CAPS) can help students who are having difficulties managing stress, adjusting to college, or feeling sad and hopeless. You can reach (CAPS) by calling 713-743-5454 during and after business hours for routine appointments or if you or someone you know is in crisis. No appointment is necessary for the "Let's Talk" program, a drop-in consultation service at convenient locations and hours around campus.

Department of Mathematics

Undergraduate Studies

MATH 4323 - Data Science and Statistical Learning