My Attempt at Outperforming Deepmind’s Atari Results – UPDATE 12

Hello!

It has been some time since I last posted here. Since my last post, many changes have occurred to my CHTM architecture.

First off, the reinforcement learning algorithm is now based on CACLA (continuous actor-critic learning automaton), but the actor and critic are part of the same structure. The temporal prediction system only learns to predict the last action when the temporal difference error is positive. Read more about CACLA here: http://webdocs.cs.ualberta.ca/~vanhasse/rl_algs/Cacla.html

Second, the Q values are now stored within the HTM cells. Since each cell *should* only be active in a unique state (taking partial observability into account), this means that each cell can effectively store the Q value for that state. This is both more biologically plausible (I guess…), and far more efficient. It can now be thought of as a “smart, self-adapting” look-up table.

Third, cells can now predict far more accurately with the addition of dendrite segments. In standard machine learning terms, this is a small perceptron with one hidden layer and an output layer where all weights are 1 (OR operation). This allows the cell to differentiate more complex patterns, limited only by the number of segments (hidden units).

Fourth, the spatial pooler has been reworked several times, and is now essentially a sparse autoencoder with explicit lateral inhibition. It still uses boosting to help out “dead” columns.

So, what about results? Well, I am not quite satisfied with the results yet, but they have been improving a lot. I am performing pole balancing with a twist: It uses vision data (64 x 32 pixels). I made a little plotter to plot my  reward values, and this is one of the more recent runs:

rewardIncreasing

The system now looks like this when visualized with the volumetric renderer:

nonuniform

You may notice that the SDRs are no longer uniform across the layers. The new spatial pooler leaves columns off when all the inputs are 0. I am mostly doing this for debugging purposes at the moment, since it makes it easier to see what is going on.

So that’s it for now, I will hopefully have more interesting results soon, perhaps with a video. I want to start giving updates regularly again as well!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s