MATH 4322 - Introduction to Data Science and Machine Learning - University of Houston
Skip to main content

MATH 4322 - Introduction to Data Science and Machine Learning

***This is a course guideline. Students should contact instructor for the updated information on current course syllabus, textbooks, and course content***
Prerequisites: MATH 3339

Course Description: Course will deal with theory and applications for such statistical learning techniques as linear and logistic regression, classification and regression trees, random forests, neural networks. Other topics might include: fit quality assessment, model validation, resampling methods. R Statistical programming will be used throughout the course.

Textbook: While lecture notes will serve as the main source of material for the course, the following book constitutes a great reference:
  • ”An Introduction to Statistical Learning (with applications in R)” by James, Witten et al. ISBN: 978-1461471370
  • ”Neural Networks with R” by G. Ciaburro. ISBN: 978-1788397872

 

Learning Objectives: By the end of the course a successful student should:

• Have a solid conceptual grasp on the described statistical learning methods.
• Be able to correctly identify the appropriate techniques to deal with particular data sets.
• Have a working knowledge of R programming software in order to apply those techniques and subse- quently assess the quality of fitted models.
• Demonstrate the ability to clearly communicate the results of applying selected statistical learning methods to the data.

Software: Make sure to download R and RStudio (which can’t be installed without R) before the course starts. Use the link https://www.rstudio.com/products/rstudio/download/ to download it from the mirror appropriate for your platform. Let me know via email in case you encounter difficulties.

Course Outline: 
  • Introduction: What is Statistical Learning? Supervised and unsupervised learning. Regression and classification.
  • Linear and Logistic Regression. Continuous response: simple and multiple linear regression. Binary response: logistic regression. Assessing quality of fit.
  • Model Validation. Validation set approach. Cross-validation.
  • Tree-based Models. Decision and regression trees: splitting algorithm, tree pruning. Random forests: bootstrap, bagging, random splitting.
  • Neural Networks. Single-layer perceptron: neuron model, learning weights. Multi-Layer Perceptron: backpropagation, multi-class discrimination
Grading: Please consult your instructor's syllabus regarding any and all grading guidelines.

Justin Dart Jr. Center Accommodations:

Academic Adjustments/Auxiliary Aids: The University of Houston System complies with Section 504 of the Rehabilitation Act of 1973 and the Americans with Disabilities Act of 1990, pertaining to the provision of reasonable academic adjustments/auxiliary aids for students who have a disability. In accordance with Section 504 and ADA guidelines, University of Houston strives to provide reasonable academic adjustments/auxiliary aids to students who request and require them. If you believe that you have a disability requiring an academic adjustments/auxiliary aid, please visit Justin Dart Jr. Student Accessibility Center website at https://www.uh.edu/accessibility/ for more information.

UH CAPS

Counseling and Psychological Services (CAPS) can help students who are having difficulties managing stress, adjusting to college, or feeling sad and hopeless. You can reach (CAPS) by calling 713-743-5454 during and after business hours for routine appointments or if you or someone you know is in crisis. No appointment is necessary for the "Let's Talk" program, a drop-in consultation service at convenient locations and hours around campus.