In many sequential decision making problems, ranging from electronic commerce (recommendation systems, internet advertising, content optimization), revenue and inventory management to playing Atari games, the decision maker faces a fundamental tradeoff between taking the option that has performed best in the past and exploring new options to gather more information, i.e. the exploitation vs. exploration. This course will discuss recent advances in combining machine learning and optimization techniques to achieve near-optimal tradeoff between exploration and exploitation when making sequential decisions under uncertainty. Topics covered include:
-multi-armed bandits (stochastic and adversarial)

-linear bandits

-convex bandits

-bandits in metric spaces

-bandits with global constraints

-reinforcement learning (Markov Decision Processes)
Algorithm design ideas, theoretical performance analysis, motivating applications and open questions will be discussed for each of these topics.

Add Discussion