Using random walks to analyze Prediction with Expert Advice

Yuval Peres
Microsoft Research

We study the classical problem of prediction with expert advice in the adversarial setting with a geometric stopping time. Cover (1965) gave the optimal algorithm that minimizes worst-case regret for the case of 2 experts. In this talk, I will describe the optimal algorithm, adversary and regret for the case of 3 experts. We will see that optimal algorithm for 2 and 3 experts is a probability matching algorithm (analogous to Thompson sampling) against a particular randomized adversary. Remarkably, it turns out that this algorithm is not only optimal against this adversary, but also minimax optimal against all possible adversaries. The analysis of the optimal adversary relies on delicate random walk estimates. At the end of the talk, I will discuss the case of "Bandit feedback", when we just learn the gain of the action we chose, and analyze the effects of imposing a switching cost. This analysis uses a Gaussian Branching random walk.

(Talk based on joint works with Nick Gravin and Balu Sivan and with Ofer Dekel, Jian Ding and Tomer Koren.)

Presentation (PDF File)

Back to Workshop IV: Synergies between Machine Learning and Physical Models