Academics / Courses / DescriptionsIEMS 395-1: Machine Learning for Data Science
Academics
/ Courses
/ Descriptions
VIEW ALL COURSE TIMES AND SESSIONS
Prerequisites
A course in statistics at the level of IEMS 303; A course in matrix analysis; Proficiency in programming, as extensive coding will be a key part of the curriculum. This course is considered a duplicate of IEMS 304. Students cannot receive credit for both, and IE majors cannot use it towards their degree requirements.Description
This course offers a modern treatment of statistical learning, focusing on modeling data for prediction and insight. It blends theoretical foundations with practical programming applications to prepare students for both advanced study and real-world data analysis.
LEARNING OBJECTIVES
- Select the appropriate statistical tool for a given problem, interpret the results accurately, and avoid common pitfalls
- Gain hands-on experience using a programming language to perform data analysis
- Build a solid understanding of the underlying mathematics to empower further study and effective application in professional settings
TOPICS
- Overview of Statistical Learning: Introduction to Regression Models, Dimensionality and Structured Models, Model Selection and Bias-Variance Tradeoff, Classification, and an Introduction to Programming
- Linear Regression: Simple and multiple linear regression, hypothesis testing, confidence intervals, model extensions, and coding applications
- Classification: Exploration of classification problems including logistic regression, multivariate logistic regression, discriminant analysis (Gaussian, quadratic, and naive Bayes), plus coding exercises
- Resampling Methods: Techniques such as cross-validation, k-fold cross-validation, and (optionally) bootstrap methods, along with related coding
- Linear Model Selection and Regularization: Methods like best-subset selection, stepwise/backward selection, shrinkage (ridge regression and the lasso), tuning parameter selection, and optional dimension reduction methods with coding
- Moving Beyond Linearity (optional): Topics include polynomials, step functions, splines (piecewise-polynomials, smoothing splines), generalized additive models, local regression, and coding for nonlinear functions
- Tree-Based Methods (optional): Covering classification trees, bagging, random forests, boosting, and associated coding
- Unsupervised Learning (optional): Focus on principal components, higher order principal components, k-means clustering, hierarchical clustering, and corresponding coding
MATERIALS
- Course materials are provided on Canvas (including slides, handouts, homework assignments, datasets, and announcements)
- Required Text:
"An Introduction to Statistical Learning with Applications in Python" by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani (with a companion version in R) - Required Software:
Interactive Python tutorials provided by Kaggle (covering Python programming, introductory and intermediate machine learning, pandas, data visualization, and feature engineering) - Additional References:
"The Elements of Statistical Learning" by Hastie, Tibshirani, and Friedman, and "The Matrix Cookbook" by Kaare Petersen and Michael Pedersen