Reinforcement learning (RL) agents learn to perform a task through trial-and-error interactions with an initially unknown environment. Despite the recent progress in deep RL, several unsolved challenges limit the applicability of RL to real-world tasks, including efficient exploration in high-dimensional spaces, learning efficiency, safety, and the high cost of human supervision. Towards solving these challenges, this talk focuses on how we can balance self-supervised and human-supervised RL to efficiently train an agent for solving various robotic continuous control tasks. We address the following questions:
1. How can we amortize the cost of learning to explore?
2. How can we learn a semantically meaningful representation for faster exploration and learning?
3. Can we distill exploration into a reusable reward function?
Back to Mathematics of Collective Intelligence