Study Options / Course Listings / DescriptionsMSIT 423: Data Science for Business Intelligence
VIEW ALL COURSE TIMES AND SESSIONS
Description
In the rapidly changing business environment, with global competition and maturing markets, competitive advantage is extremely important. Business can exploit the wealth of massive amounts of data being collected through operational processes as well as from external sources. This course introduces techniques for data mining and its use in various business applications to enable business intelligence. The course uses both hands-on
REQUIRED TEXT: James, Witten, Hastie and Tibshirani, An Introduction to Statistical Learning, Springer, 2013.
REFERENCE TEXT:
- Leskovec, Rajaraman, and Ullman, Mining of Massive Datasets
- (Supplementary reading) Hastie, Tibshirani and Friedman, The Elements of Statistical Learning: Data mining, inference
and prediction, Springer.
COURSE GOALS: Students will understand and manage the entire process of using data to make better business decisions: extraction, cleaning, understanding, modeling, and presenting. Students will also understand the limitations of data.
DETAILED COURSE TOPICS:
Week 1: Introduction to Predictive Analytics
- Course introduction
- Simple linear regression
- Multiple linear regression, interpretation, and basic inference
- Readings: JWHT, sections 3.1, 3.2, 3.6.1-3.6.3
Week 2: Model Accounting and Multicollinearity
- Extra and partial sums of squares, R-squared
Newfood and Quality Control cases- Multicollinearity
- Quality control case
- Residual, QQ and influence plots
- Readings: JWHT, section 3.3.3.6
Week 3: Diagnostics and Transformations
- Transformations, the multiplicative model, polynomials
- Business failure and purifier cases
- Readings: JWHT 3.3.3.1-3.3.3.5
Week 4: Categorical Predictor Variables, Interactions and Logistic Regression
- Dummy variables
- Interactions
- Logistic regression
- Readings: JWHT 3.6.4, 3.6.6, 4.1-4.3 (skip discriminant analysis)
Week 5: Model Evaluation, Selection and Regularization
- Confusion tables, ROC curves, AUC
- Penalized measures of fit
- Test sets and k-fold cross validation
- Variable subset selection
- Ridge regression and the lasso
- Readings: JWHT 5.1, 5.3.1, 5.3.3; 6.1, 6.2, 6.5, 6.6
Week 6: Midterm and Smoothing
- In-class midterm, 80 minutes, covers chapters 3 and 4 (not 5 and 6)
- Bin smoothers, k-nearest neighbors
- Step functions, piecewise linear models and cubic splines
- Readings: JWHT Ch. 7.1-6
Week 7: GAMS and Trees
- Generalized additive models
- CART
- Readings: JWHT sections 7.7, 8.1, 8.3.1, 8.3.2
Week 8: Bagging, Random Forests, Principal Components
- Bagging and random forests
- Stumps, shrubs, boosted trees as time permits
- Principal component analysis
- JWHT sections 8.2, 8.3.3, 10.1-2, 10.4
Week 9: Clustering and Recommendation Systems
- K-means and hierarchical clustering
- Distance metrics
- Overview of recommendation systems: popularity, user-based, item-based, SVD as time permits
- JWHT sections 10.3, 10.5; Ekstrand chapter on Canvas
Week 10: Project Presentations
Week 11: Final Exam Due
HOMEWORK ASSIGNMENTS: There will be weekly recommended homework problems (with answers). You will have an in-class midterm and a take-home final that must be completed individually.
GRADES:
- Homework: 20%
- Midterm: 25%
- Project: 20%
- Final: 35%
COURSE OBJECTIVES: As a result of this course, students will be able to:
1. Identify data-collection biases;
2. Design effective graphics presentations of data;
3. Estimate and interpret classical and data mining models using the R software package;
4. Draw conclusions about causal relationships and recommend actions that should be taken based on an analysis;