Curbing Our Enthusiasm: Constraining Decision Policies Learned from the Past to Ensure Good Futures

Emma Brunskill
Stanford University

There is growing interest in batch off policy RL, spurred in part by the vast datasets of prior decisions and their outcomes. Yet off policy RL can be challenging, with well known divergence results. In this talk I'll summarize some of our work in this area to tackle off policy evaluation and off policy optimization, including a structural minimization style result for guaranteeing future performance, and practical algorithms that we have used to quickly learn personalized policies from historical data for a high fidelity diabetes simulator.


Back to Long Programs