My Attempt at Outperforming Deepmind’s Atari Results – UPDATE 4

Hello again!

As I stated in my previous posts, the precision of the function approximation remains a problem. However, I have found that using SARSA instead of Q learning (on-policy instead of off-policy) mitigates the problem a lot since it is no longer bootstrapping on the largest potential value (subject to overestimation) but rather the potential value of the selected action.

Due to this improvement, I decided to move back to feed-forward neural networks with RMS training. The RBF networks train much faster, but they seem to forget things faster (things not part of the replay chain). I assume this is due to the on-line supervised learning that occurs, which can destroy previously learned information quite easily in favor of better matching new information.

I now have HTMRL performing both pole balancing and the mountain car problem with very little training time. It figures both of them out in under a minute usually.

I started scaling up to the Arcade Learning Environment, and did some short test runs. I also did a lot of optimization, since I want it to run in real-time (Deepmind’s algorithm was not in real-time as far as I know, and took more than just one standard desktop PC). I have not yet trained it enough to get decent results (5 minutes is not much), but I will probably try an overnight run soon.

For the continuous action edition of the algorithm, I experimented with free-energy based reinforcement learning. It learns a value function as usual, but it can easily derive a continuous policy from the value function. I started implementing replay updates for this as well.

Here is a short video of the system learning the mountain car problem:

For those just seeing this for the first time, the source code for this is available here, under the directory “htmrl”:https://github.com/222464/AILib It uses the CMake build system.

It’s getting there!

See you next time!

	Yudaka on A new idea: HTFERL
	CK Loo on Time Series Predictions and…
	Dinesh Vadhia on Text2SDR
	cireneikual on Text2SDR
	Dinesh Vadhia on Text2SDR

cireneikual

AI and Graphics

Menu

My Attempt at Outperforming Deepmind’s Atari Results – UPDATE 4

Leave a comment Cancel reply

Menu

Share this:

Related

Leave a comment Cancel reply