COSC 4337: Data Science II, Spring 2024

General Information

Instructor:

Ricardo Vilalta (r.vilalta.us@ieee.org)

Office:

Durga D. and Sushila Agrawal Engineering Research Bldg. Room 203C.

Office Hours:

By appointment.

Class Time:

The class is fully online (asynchronous).

Telephone:

(713) 743-3614

Readings:

"An Introduction to Statistical Learning with Applications in R" by G. James, D. Witten, T. Hastie, R. Tibshirani. Springer, 2013.

TA Information

Yasmin Farzana Michail Koumpanakis
Office Hours: Thursdays 3:00 - 4:00 PM via MS Teams Office Hours: Tuesdays 10:00 AM - Noon via MS Teams
Email: fyasmin2@uh.edu Email: mkoumpanakis@uh.edu

Course Description

This course offers advanced modeling and analysis techniques and is intended for students who completed the course on Data Science I. A distinctive feature of this course is the opportunity to apply data science skills on a semester-long project based on real-world data. After this course, students will be familiar with the most popular data science and machine learning techniques, ready to apply for variousdata-science jobs in industry.

Upon completion of this course, students will be able to conduct a data analysis project using an analytical programming language (R/Python); to visualize and preprocess raw data in preparation for deeper forms of analysis; to train a variety of machine learning models, including Decision Trees, Neural Networks, and Support Vector Machines; to test and fine-tune analytic models to produce high accuracy rates; to model evaluation will be achieved using cross-validation, learning and validation curves, grid search, and different performance metrics; to use Ensemble Learning to improve model performance; to avoid data overfitting by working on the Bias-Variance trade-off.

For more information, visit the course on Canvas.

Grading

Graded Work Weight
Midterm Exams 40%
Project Report: 1st Milestone 20%
Project Report: 2nd Milestone 20%
Project Report: 3rd Milestone 20%

Calendar

Dates to Remember Event
January 16 1st class - check material on Blackboard
February 21 1st Milestone - Data Pre-processing
April 12 2nd Milestone - Data Modeling
May 3 3rd Milestone - Data Presentation and Visualization
February 28 1st Midterm Exam
April 17 2nd Midterm Exam

Note: This course has no final exam.

Schedule Lectures

Dates Topic
Week of January 16 Overview of Statistical Learning
Week of January 22 Linear Regression
Week of January 29 Classification
Week of February 5 Resampling Methods
Week of February 12 No Class
Week of February 19 Linear Model Selection and Regularization
February 28 1st Midterm Exam
Week of March 4 Nonlinear Regression
Week of March 11 No Class; Spring Break
Week of March 18 Tree-Based Methods
Week of March 25 Ensemble Methods
Week of April 1 Support Vector Machines
Week of April 8 Unsupervised Learning
April 17 2nd Midterm Exam

Additional Information

For more information, visit the course on Canvas.