Course Schedule

Weekday Regular Schedule

Group Type Hours Location
01 Lecture Monday 13-16 Shrieber 007

Course Plan

The course roughly consists of three parts:

  1. online convex optimization
  2. adversarial and stochastic multi-arm bandits
  3. advanced topics.

A tentative list of topics (will not cover all, and might cover some other topics):

  • Intro: online learning
  • Online convex optimization:
    • Follow the Leader, Follow the regularized leader,
    • Gradient Descent, Mirror Dscent
    • Convex duality
  • Online classificatrion
  • Partial Information (Multi-Arm Bandits)
  • Stochastic MAB: UCB, Explore/Expoit, Best Arm identification
  • Lower bounds: adversarial and stochastic (intro to Info Theory)
  • Regret and Equilibrium (zero-sum games, correlated equilibrium)
  • Special types of MAB:
    • Contextual
    • linear
    • Markovian (Gittins Index)
    • switching cost
  • MAB and pricing
  • Portfolio section

Detailed Schedule

Week Date Lecture topics references Lecture Scribe
1 Oct 23 Introduction, online learning model, regret minimization basics Section 1 from:
Online Learning and Online Convex Optimization. Shai Shalev-Shwartz
Section 4.3 from:
Learning, Regret minimization, and Equilibria, A. Blum and Y. Mansour
scribe 1
2 Oct 30 Online Convex Optimization: intro, Follow the Leader, Follow the Regularized Leader Sections 2.1, 2.2., 2.3 from:
Online Learning and Online Convex Optimization. Shai Shalev-Shwartz
scribe 2
3 Nov 6 Online Gradient Descent and strong convex regularizers Sections 2.4, 2.5 from:
Online Learning and Online Convex Optimization. Shai Shalev-Shwartz
scribe 3
4 Nov 13 Fenchel duality and online mirror descent Sections 2.6, 2.7 and 2.8 from:
Online Learning and Online Convex Optimization. Shai Shalev-Shwartz
Also scribe notes here and here
scribe 4
5 Nov 20 Online classification (Perceptron and Winnow), online-to-batch Sections 3.3 and 5 from:
Online Learning and Online Convex Optimization. Shai Shalev-Shwartz
scribe5
6 Nov 27 Multi-Arm Bandits Section 4 from:
Online Learning and Online Convex Optimization. Shai Shalev-Shwartz
Also,
Online convex optimization in the bandit setting: gradient descent without a gradient.
Abraham D. Flaxman, Adam Tauman Kalai, and H. Brendan McMahan
scribe6
7 Dec 4 Stochastic MAB (Explore-Then-Exploit, Successive Action Elimination, Upper Confidence Bound (UCB))
Best Arm Identification (Halving Algorithm)
Chapter 2:
Introduction to Multi-Armed Bandits Aleksandrs Slivkins
Also,
Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems
Eyal Even-Dar, Shie Mannor and Yishay Mansour, Journal of Machine Learning Research (2006).
scribe7
8 Dec 11 Lower bounds Chapter 3:
Introduction to Multi-Armed Bandits Aleksandrs Slivkins
Also see here
scribe8
9 Dec 18 Applications of regret minimization for zero sum games and algorithm design. Notes on regret and zero sum games
Algorithm design and Regret
scribe9
10 Dec 25 Swap regret and its applications (correlated equilibrium, dominated actions and calibration) From External to Internal Regret Avrim Blum and Yishay Mansour
Also sections 4.4 and 4.5 from:
Learning, Regret minimization, and Equilibria Avrim Blum and Yishay Mansour
Also class notes Section 5.8.3 and Section 8.3 and 8.4
scribe10
11 Jan 1 Swap regret and wide range regret From External to Internal Regret Avrim Blum and Yishay Mansour (section 3 and 6)
Minimizing Wide Range Regret with Time Selection Functions Subhash Khot and Ashok Kumar Ponnuswami
scribe11
12 Jan 8 Contextual bandits and Lipschitz bandits Parts of chapters 5,7,9:
Introduction to Multi-Armed Bandits Aleksandrs Slivkins
also class notes on EXP4 by Brendan McMahan here and contextual bandits by Alekh Agarwal here
scribe12
13 Jan 15 Gittins Index Lecture notes here, based on:
J. Gittins, K. Glazerbrook and R. Weber, Multi-Arm Bandit Allocation Indices
John N. Tsitsiklis, A short proof of the Gittins index theorem
scribe13
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License