Academics
  /  
Courses
  /  
Descriptions
IEMS 395-1: Machine Learning for Data Science


VIEW ALL COURSE TIMES AND SESSIONS

Prerequisites

A course in statistics at the level of IEMS 303; A course in matrix analysis; Proficiency in programming, as extensive coding will be a key part of the curriculum. This course is considered a duplicate of IEMS 304.  Students cannot receive credit for both, and IE majors cannot use it towards their degree requirements.

Description

This course offers a modern treatment of statistical learning, focusing on modeling data for prediction and insight. It blends theoretical foundations with practical programming applications to prepare students for both advanced study and real-world data analysis. 

LEARNING OBJECTIVES

  • Select the appropriate statistical tool for a given problem, interpret the results accurately, and avoid common pitfalls
  • Gain hands-on experience using a programming language to perform data analysis
  • Build a solid understanding of the underlying mathematics to empower further study and effective application in professional settings

 TOPICS

  • Overview of Statistical Learning: Introduction to Regression Models, Dimensionality and Structured Models, Model Selection and Bias-Variance Tradeoff, Classification, and an Introduction to Programming
  • Linear Regression: Simple and multiple linear regression, hypothesis testing, confidence intervals, model extensions, and coding applications
  • Classification: Exploration of classification problems including logistic regression, multivariate logistic regression, discriminant analysis (Gaussian, quadratic, and naive Bayes), plus coding exercises
  • Resampling Methods: Techniques such as cross-validation, k-fold cross-validation, and (optionally) bootstrap methods, along with related coding
  • Linear Model Selection and Regularization: Methods like best-subset selection, stepwise/backward selection, shrinkage (ridge regression and the lasso), tuning parameter selection, and optional dimension reduction methods with coding
  • Moving Beyond Linearity (optional): Topics include polynomials, step functions, splines (piecewise-polynomials, smoothing splines), generalized additive models, local regression, and coding for nonlinear functions
  • Tree-Based Methods (optional): Covering classification trees, bagging, random forests, boosting, and associated coding
  • Unsupervised Learning (optional): Focus on principal components, higher order principal components, k-means clustering, hierarchical clustering, and corresponding coding

 MATERIALS

  • Course materials are provided on Canvas (including slides, handouts, homework assignments, datasets, and announcements)
  • Required Text:
      "An Introduction to Statistical Learning with Applications in Python" by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani (with a companion version in R)
  • Required Software:
      Interactive Python tutorials provided by Kaggle (covering Python programming, introductory and intermediate machine learning, pandas, data visualization, and feature engineering)
  • Additional References:
      "The Elements of Statistical Learning" by Hastie, Tibshirani, and Friedman, and "The Matrix Cookbook" by Kaare Petersen and Michael Pedersen