Academics
/
Courses
/
Descriptions
IEMS 395-1: Machine Learning for Data Science

Prerequisites

A course in statistics at the level of IEMS 303; A course in matrix analysis; Proficiency in programming, as extensive coding will be a key part of the curriculum. This course is considered a duplicate of IEMS 304. Students cannot receive credit for both, and IE majors cannot use it towards their degree requirements.

Description

This course offers a modern treatment of statistical learning, focusing on modeling data for prediction and insight. It blends theoretical foundations with practical programming applications to prepare students for both advanced study and real-world data analysis.

LEARNING OBJECTIVES

Select the appropriate statistical tool for a given problem, interpret the results accurately, and avoid common pitfalls
Gain hands-on experience using a programming language to perform data analysis
Build a solid understanding of the underlying mathematics to empower further study and effective application in professional settings

TOPICS

Overview of Statistical Learning: Introduction to Regression Models, Dimensionality and Structured Models, Model Selection and Bias-Variance Tradeoff, Classification, and an Introduction to Programming
Linear Regression: Simple and multiple linear regression, hypothesis testing, confidence intervals, model extensions, and coding applications
Classification: Exploration of classification problems including logistic regression, multivariate logistic regression, discriminant analysis (Gaussian, quadratic, and naive Bayes), plus coding exercises
Resampling Methods: Techniques such as cross-validation, k-fold cross-validation, and (optionally) bootstrap methods, along with related coding
Linear Model Selection and Regularization: Methods like best-subset selection, stepwise/backward selection, shrinkage (ridge regression and the lasso), tuning parameter selection, and optional dimension reduction methods with coding
Moving Beyond Linearity (optional): Topics include polynomials, step functions, splines (piecewise-polynomials, smoothing splines), generalized additive models, local regression, and coding for nonlinear functions
Tree-Based Methods (optional): Covering classification trees, bagging, random forests, boosting, and associated coding
Unsupervised Learning (optional): Focus on principal components, higher order principal components, k-means clustering, hierarchical clustering, and corresponding coding

MATERIALS

Course materials are provided on Canvas (including slides, handouts, homework assignments, datasets, and announcements)
Required Text:
"An Introduction to Statistical Learning with Applications in Python" by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani (with a companion version in R)
Required Software:
Interactive Python tutorials provided by Kaggle (covering Python programming, introductory and intermediate machine learning, pandas, data visualization, and feature engineering)
Additional References:
"The Elements of Statistical Learning" by Hastie, Tibshirani, and Friedman, and "The Matrix Cookbook" by Kaare Petersen and Michael Pedersen

Academics / Courses / DescriptionsIEMS 395-1: Machine Learning for Data Science

Prerequisites

Description

Academics
/
Courses
/
Descriptions
IEMS 395-1: Machine Learning for Data Science