COSC 4337: Data Science II, Spring 2023

General Information

Instructor:

Ricardo Vilalta (r.vilalta.us@ieee.org)

Office:

Durga D. and Sushila Agrawal Engineering Research Bldg. Room 203C.

Office Hours:

BY appointment.

Class Time:

We will meet Tuesdays and Thursdays at 2:30 PM CT via MS Teams.

Telephone:

(713) 743-3614

Readings:

"An Introduction to Statistical Learning with Applications in R" by G. James, D. Witten, T. Hastie, R. Tibshirani. Springer, 2013.

TA Information

Farzana Jasmin
Office Hours: Wednesdays 10:00 - 11:00 AM (MS Teams)
Email: fyasmin2@cougarnet.uh.edu

Course Description

This course offers advanced modeling and analysis techniques, and is intended for students who completed the course on Data Science I. A distinctive feature of this course is the opportunity to apply data-science skills on a semester long project based on real-world data. After this course, students will be familiar with the most popular data science and machine learning techniques, ready to apply for a variety of data-science jobs in industry.

Upon completion of this course, students will be able to: conduct a data analysis project using an analytical programming language (R/Python); visualize and preprocess raw data in preparation for deeper forms of analysis; train a variety of machine learning models, including Decision Trees, Neural Networks, and Support Vector Machines; test and fine-tune analytic models to produce high accuracy rates; model evaluation will be achieved using cross-validation, learning and validation curves, grid search, and different performance metrics; use Ensemble Learning to improve model performance; avoid data overfitting by working on the Bias-Variance trade off.

For more information visit the course on Blackboard.

Grading

Graded Work Weight
Midterm Exams 40%
Project Report: 1st Milestone 20%
Project Report: 2nd Milestone 20%
Project Report: 3rd Milestone 20%

Calendar

Dates to Remember Event
January 17 1st class - check material on Blackboard
February 24 1st Milestone - Data Pre-processing
April 10 2nd Milestone - Data Modeling
May 5 3rd Milestone - Data Presentation and Visualization
March 2 1st Midterm Exam
April 20 2nd Midterm Exam

Note: There is no final exam in this course.

Schedule Lectures

Dates Topic
January 17, 19 Overview of Statistical Learning
January 24, 26 Linear Regression
January 31, February 2 Classification
February 7, 9 Resampling Methods
February 14, 16 No Class
February 21, 23 Linear Model Selection and Regularization
March 2 1st Midterm Exam
March 7, 9 Nonlinear Regression
March 14, 16 No Class; Spring Break
March 21, 23 Tree-Based Methods
March 28, 30 Ensemble Methods
April 4, 6 Support Vector Machines
April 11, 13 Unsupervised Learning
April 20 2nd Midterm Exam

Additional Information

For more information visit the course on Blackboard.