Curriculum
Data Science Studio Courses

DATA_ENG courses are only open to students who have been admitted to the Machine Learning and Data Science minor and are on the Data Science or Hybrid tracks. To apply for admission to the minor, please see the application and selection information.

DATA_ENG 200 Foundations of Data Science

Offered: Winter (TTh 9:30-10:50 a.m.) and Spring (TTh 12:30-1:50 p.m.)

Foundations of Data Science will cover the fundamentals of data science and the context within which this field operates. This course will introduce the steps of the data science lifecycle and the associated data tools and techniques, through implementation in languages such as Python. This course is reserved for students pursuing the McCormick Machine Learning and Data Science Minor. We encourage students to take this early in their studies for the minor. It is the first part of a two-part sequence with DATA_ENG 300.

Prerequisite: COMP_SCI 150

Learning Objectives

(General overview)

Students will understand the core concepts and scope of data science.
Students will understand the stages of the data science lifecycle and the common tools and techniques used.
Students will be able to formulate and scope innovative, relevant, or scientific questions that can be addressed with data.
Students will be able to utilize computational thinking for problem-solving in data science.
Students will be able to present data findings through written communications and visual aids through homework assignments and a project presentation.

(Related to specific. topics)

Students will be able to conduct exploratory data analysis to uncover insights.
Students will know and be able to apply principles of data cleaning and manipulation.
Students will know and be able to apply the principles of algorithmic data collection and joining of multiple data sources.
Students will know and be able to identify and avoid common pitfalls in data analytics, such as algorithmic bias.
Students will know and be able to construct reproducible data science pipelines to ensure replicability of analyses.

(If time permits)

Students will understand and apply best practices for handling and protecting sensitive data.
Students will be able to implement version control to manage and track changes in data projects.

Topics

Introduction to data science
Data exploration and visualization
Data manipulation, transformation, and standardization
Algorithmic data retrieval methods
Statistical modeling and machine learning
Introduction to cloud computing
Ethics and algorithmic bias

(If time permits)

Data security and privacy
Version control

DATA_ENG 300 Data Engineering Studio

Offered: Winter (TTh 12:30-1:50 p.m.) and Spring (TTh 9:30-10:50 a.m.)

Data Engineering Studio teaches how to build a sustainable data science lifecycle. Students will analyze data in multiple contexts (e.g., SQL, building machine learning models), share the findings with peers, and practice iteratively refining the analysis based on feedback. They will become acquainted with the common pitfalls in applying data analytics to real-world datasets. Several modern data engineering tools, such as docker containers, Spark, Airflow, and MLFlow, will be covered. This course is reserved for students pursuing the McCormick Machine Learning and Data Science Minor. We encourage students to take this course at the end of their studies in the minor. It is the second part of a two-part sequence with DATA_ENG 200.

Prerequisite: DATA_ENG 200, Statistics Foundations course; CS150; CS 214 or 217; IEMS 304 or CS 349.

Learning Objectives

(General Overview)

Students will understand the core concepts and scope of data engineering.
Students will understand the stages of the data engineering lifecycle and the common tools and techniques used.
Students will understand and be able to conduct exploratory data analysis to uncover insights from data.
Students will know and be able to design and manage relational and non-relational databases effectively.
Students will understand and be able to apply the principles of distributed (cloud) computing.
Students will be able to use Spark to accomplish extract-transform-load and extract-load-transform of data.
Students will be able to automate data and machine learning pipelines to enhance efficiency and reproducibility.

(If time permits)

Students will know and be able to design and implement A/B tests to evaluate hypotheses.
Students will understand and be able to apply transfer learning techniques to improve model performance with limited data.

Topics

Introduction to data engineering
Containerization
Exploratory data analysis
Distributed (cloud) computing
ETL and ELT via Spark
Automation of data pipelines
NoSQL databases

(If time permits)

A/B testing
Transfer learning

Curriculum
Data Science Studio Courses

DATA_ENG 200 Foundations of Data Science

DATA_ENG 300 Data Engineering Studio

More in this section

Contact Info

QUESTIONS ABOUT THE PROGRAM?

INTERESTED?

Request Info

Request Your Program & Application Guide

CurriculumData Science Studio Courses

DATA_ENG 200 Foundations of Data Science

DATA_ENG 300 Data Engineering Studio

More in this section

Contact Info

QUESTIONS ABOUT THE PROGRAM?

INTERESTED?

Request Info

Request Your Program & Application Guide

Curriculum
Data Science Studio Courses