Academics
/
Courses
/
Descriptions
COMP_SCI 496: Theory of Gradient-Based Optimization in ML

Prerequisites

CS MS or CS PhDs students or consent of instructor

Description

In this course, you’ll learn theoretical foundations of optimization methods used for training deep machine learning models.

Why does gradient descent work? Specifically, what can we guarantee about the point it converges to?
In addition to gradient descent, we explore a large variety of optimization methods. What are the advantages and disadvantages of these methods?

The optimization methods will be used to train real-world models on real-world datasets.

We will dive into key aspects of deep learning and their relation to optimization.

How do various design choices of neural networks - such as initialization, batch normalization, residual networks, dropouts, etc. - affect the training process?
Why do trained models generalize well? What optimization methods lead to better generalization?
We’ll discuss the loss landscape of the neural networks. In particular, we discuss various types of minima which the optimization methods can find, and how we can find better minima with better generalization.
We will talk in depth about distributed and federated learning.
We will discuss how to enhance optimization methods to preserve data privacy.
We will explore challenges and optimization methods specific for constrained optimization.

Prerequisites:

Some prerequisites are marked as “optional” but will make the course easier if you're already familiar with them.

Overall, you should be comfortable with math.
Basic probability theory
- Main concepts and some common distributions (specifically, normal distribution
Basic linear algebra:
- You should know and understand matrix multiplication and inner product
- (Optional) Eigenvalues/eigenvectors
Derivatives and integrals:
- Definitions, ability to compute derivatives: “chain rule”, knowing some common derivatives
- (Optional) Understanding Taylor series
Basic multivariate calculus:
- Gradient (∇f), Hessian (∇2f)
- (Optional) chain rule in multivariate calculus
Python:
- You should be comfortable with programming and Python
- (Optional) PyTorch or TensorFlow or MXNet

Assessment

The course includes the following forms of evaluation.

Written assignments, where you have to write mathematical proofs.
Programming assignments, typically involving training a neural network with a specified optimization method
A final project with a presentation.

Overall, the course will not be heavy on the assignments

Academics / Courses / DescriptionsCOMP_SCI 496: Theory of Gradient-Based Optimization in ML

Prerequisites

Description

Prerequisites:

Assessment

More detailed list of topics (what I will post when the class starts)

Academics
/
Courses
/
Descriptions
COMP_SCI 496: Theory of Gradient-Based Optimization in ML