My Attempt at Outperforming Deepmind’s Atari Results – UPDATE 4

Hello again!

As I stated in my previous posts, the precision of the function approximation remains a problem. However, I have found that using SARSA instead of Q learning (on-policy instead of off-policy) mitigates the problem a lot since it is no longer bootstrapping on the largest potential value (subject to overestimation) but rather the potential value of the selected action.

Due to this improvement, I decided to move back to feed-forward neural networks with RMS training. The RBF networks train much faster, but they seem to forget things faster (things not part of the replay chain). I assume this is due to the on-line supervised learning that occurs, which can destroy previously learned information quite easily in favor of better matching new information.

I now have HTMRL performing both pole balancing and the mountain car problem with very little training time. It figures both of them out in under a minute usually.

I started scaling up to the Arcade Learning Environment, and did some short test runs. I also did a lot of optimization, since I want it to run in real-time (Deepmind’s algorithm was not in real-time as far as I know, and took more than just one standard desktop PC). I have not yet trained it enough to get decent results (5 minutes is not much), but I will probably try an overnight run soon.

For the continuous action edition of the algorithm, I experimented with free-energy based reinforcement learning. It learns a value function as usual, but it can easily derive a continuous policy from the value function. I started implementing replay updates for this as well.

Here is a short video of the system learning the mountain car problem:

For those just seeing this for the first time, the source code for this is available here, under the directory “htmrl”: It uses the CMake build system.

It’s getting there!

See you next time!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s