Fast Reinforcement Learning After Pretraining

by Chuck Anderson, Pattern Exploration

We presented at IJCNN, 2015 the following paper, which won the Best Paper Award

Abstract: Deep learning algorithms have recently appeared that pretrain hidden layers of neural networks in unsupervised ways, leading to state-of-the-art performance on large classification problems. These methods can also pretrain networks used for reinforcement learning. However, this ignores the additional information that exists in a reinforcement learning paradigm via the ongoing sequence of state, action, new state tuples. This paper demonstrates that learning a predictive model of state dynamics can result in a pretrained hidden layer structure that reduces the time needed to solve reinforcement learning problems.

Below is a sequence of animations showing the progress of reinforcement learning. We simulated the dynamics of the cart-pole problem with full 360 pole rotation and collisions of the cart at the ends of the track. Details of this simulation will be provided here soon.

The animation above shows that after a few steps, our RL agent seems to prefer pushes to the right.

After 10 minutes of training, it is starting to move away from the right edge.

After 50 minutes of training, the pole is swinging up, but not being balanced.

After 100 minutes of training, our RL agent occasionally is able to use bumps at end of track to help swing the pole up.

And finally, after 200 minutes, the pole is being balanced!

With $\epsilon = 0$, so no exploration, we see quick swing up and balance.

Again, in slow motion.

Here is another run with $\epsilon = 0$, in slow motion.