On the statistical complexity of reinforcement learning

Mengdi Wang
Princeton University

Recent years have witnessed increasing empirical successes in reinforcement learning (RL). However, many theoretical questions about RL were not well understood. For example, how many observations are needed and sufficient for learning a good policy? What is the regret of online learning with function approximation in a Markov decision process (MDP)? From logged history generated by unknown behavior policies, how do we optimally estimate the value of a new policy? In this talk, I will review some recent results addressing these questions, such as the minimax-optimal sample complexities for solving MDP from a generative model, minimax-optimal off-policy evaluation by regression, and regret of online RL with nonparametric model estimation.

Presentation (PDF File)

Back to Long Programs