Academics / Courses / DescriptionsCOMP_SCI 496: Theory of Gradient-Based Optimization in ML
Academics
/ Courses
/ Descriptions
VIEW ALL COURSE TIMES AND SESSIONS
Prerequisites
CS MS or CS PhDs students or consent of instructorDescription
In this course, you’ll learn theoretical foundations of optimization methods used for training deep machine learning models.
- Why does gradient descent work? Specifically, what can we guarantee about the point it converges to?
- In addition to gradient descent, we explore a large variety of optimization methods. What are the advantages and disadvantages of these methods?
We will dive into key aspects of deep learning and their relation to optimization.
- How do various design choices of neural networks - such as initialization, batch normalization, residual networks, dropouts, etc. - affect the training process?
- Why do trained models generalize well? What optimization methods lead to better generalization?
- We’ll discuss the loss landscape of the neural networks. In particular, we discuss various types of minima which the optimization methods can find, and how we can find better minima with better generalization.
- We will talk in depth about distributed and federated learning.
- We will discuss how to enhance optimization methods to preserve data privacy.
- We will explore challenges and optimization methods specific for constrained optimization.
Prerequisites:
Some prerequisites are marked as “optional” but will make the course easier if you're already familiar with them.- Overall, you should be comfortable with math.
- Basic probability theory
- Main concepts and some common distributions (specifically, normal distribution
- Basic linear algebra:
- You should know and understand matrix multiplication and inner product
- (Optional) Eigenvalues/eigenvectors
- Derivatives and integrals:
- Definitions, ability to compute derivatives: “chain rule”, knowing some common derivatives
- (Optional) Understanding Taylor series
- Basic multivariate calculus:
- Gradient (∇f), Hessian (∇2f)
- (Optional) chain rule in multivariate calculus
- Python:
- You should be comfortable with programming and Python
- (Optional) PyTorch or TensorFlow or MXNet
Assessment
The course includes the following forms of evaluation.- Written assignments, where you have to write mathematical proofs.
- Programming assignments, typically involving training a neural network with a specified optimization method
- A final project with a presentation.
Overall, the course will not be heavy on the assignments
More detailed list of topics (what I will post when the class starts)
Practical goals:- Building simple ML models
- Implementing optimization methods
- Understanding automatic differentiation
Theoretical goals:
- Understanding theoretical foundations of deep learning
- Understanding common assumptions (convexity/smoothness)
- Convergence
Assignments:
- PyTorch (or other) Implementations
- Written assignments
- Project: large-scale ML training
Topics:
- Deep learning basics (with implementation)
- Fully connected Layers
- Activation functions
- Batch normalization
- Layer normalization
- ResNets
- Dropouts
- Regularization
- PyTorch
- Basics
- Internals (automatic differentiation, HVP)
- Convergence metrics
- Loss function
- Gradient norm
- Regret
- Expectation/high probability
- Optimization methods
- GD
- Convex/Nonconvex
- SGD
- Newton’s method
- Mirror’s descent
- Hessian Vector Product - based methods
- SGD with momentum
- AdaGrad
- RMSProp
- Adam and SignSGD
- Variance reduction
- Lower bounds
- GD
- Exploding/vanishing gradients
- Assumptions
- Convexity
- Smoothness and component Smoothness
- Lipschitz
- Bounded domain
- Types of minima
- Global minima
- Local minima
- Often are good
- Saddle points/local maxima
- Can avoid
- Distributed and Federated Learning
- Outline/challenges
- Compressed SGD
- Local steps
- Basic Privacy
- Personalization
- P2P communication, blockchain?
- Contrastive learning
- Generalization
- Loss landscape
- “Understanding NN requires rethinking generalization”
- Stability
- Sharp/flat local minima
- Constrained optimization
- Frank-Wolfe
- Interior point methods
- Privacy
- This course fulfills the Technical Elective area.
REFERENCE TEXTBOOKS: N/A
REQUIRED TEXTBOOK: N/A
COURSE COORDINATORS: Prof. Dmitrii Avdiukhin
COURSE INSTRUCTOR: Prof. Dmitrii Avdiukhin