Academics
/
Courses
/
Descriptions
IEMS 408: Decision Making in Dynamic Learning Environments

Description

Sequential decision-making under uncertainty is a foundational topic in multiple fields - including economics, operations research, and computer science, built around the foundation of Markov decision processes (MDPs). While MDPs, optimal control, and online algorithms, have all enjoyed great success going along their own paths (with successful applications in robotics, game-playing, and large-scale operations), they have started to converge recently. A core reason behind this is that as data becomes more readily available and technology improves, it allows for greater customization in algorithms for different settings. As a result, researchers in economics and operations research, engineering and TCS move towards more flexible models of “data-driven decision processes”: hybrid models that combine stochastic and adversarial information structures, and algorithms that adapt better to the structure of information, constraints, and objectives. This course aims to gain a theoretical understanding for some popular tools used to solve these problems. Topics include: theoretic foundations of reinforcement learning/dynamic programming, multi-armed bandit problems and its solutions, random graph models for social and economic networks, alongside case-studies using these algorithms for problems in operations.

Course Material

All required materials will be communicated through notes and code, via Canvas. The course will primarily use Python for all demonstrations. For some additional supplementary material:

Bertsekas, Dimitri P. "Dynamic programming and optimal control 3rd edition, volume I, II." Belmont, MA: Athena Scientific (2011).
Lattimore, Tor, and Csaba Szepesvári. “Bandit algorithms”. Cambridge University Press, 2020.
Agarwal, Alekh, et al. "Reinforcement learning: Theory and algorithms." CS Dept., UW Seattle, Seattle, WA, USA, Tech. Rep 32 (2019): 96.

Topics

The following is a tentative topic list

Multi-armed bandits and the Gittin’s index
Multiplicative weights and no-regret algorithms, online mirror descent
Reinforcement learning and dynamic programming, convergence properties
Random networks: different models and societal implications

Prerequisites

This course requires probability theory, real analysis, or equivalent for mathematical maturity. Useful classes would be IEMS 402 (Statistical Learning), MATH 320-1 (Real Analysis), IEMS 460-1 (Stochastic Processes). If you have any doubts about your background, please consult with the instructor.

Academics / Courses / DescriptionsIEMS 408: Decision Making in Dynamic Learning Environments

Description

Course Material

Topics

Prerequisites

Academics
/
Courses
/
Descriptions
IEMS 408: Decision Making in Dynamic Learning Environments