My Attempt at Outperforming Deepmind’s Atari Results – UPDATE 13

Hello again!

I finally got my HTFERL agent churning at the Atari game “Breakout”. It isn’t at DeepMind levels of proficiency yet, but it is much better than random 😉

I trained it for about 15 minutes, and it quite obviously follows the ball around. It usually gets the first and second bounce after these 15 minutes of training. Before I run it overnight, though, I want to still do some performance improvements to the system so that I don’t have to wait so long.

Picture of it playing Breakout after 15 minutes:

I would have made a video, but since the ALE uses software rendering with SDL I cannot use Fraps on it. I will look for some other recording method for a video soon though.

At this point this is more a proof of concept, but I am sure that eventually I will get somewhere within DeepMind territory of game proficiency. The system has come a long way, and I am quite happy with where it is going right now.

For those who haven’t been following this blog, here is a quick summary of how the system works and what does differently from DeepMind’s system.

HTFERL (hierarchical temporal free energy reinforcement learner) is a derivative of HTM (hierarchical temporal memory). It tries to mimic the algorithmic properties of the human neocortex. It can be thought of as a stack of predictive k-sparse autoencoders with recurrent lateral connections and feed back connections (top-down). Information doesn’t only flow upwards, but also can flow in the reverse direction if it helps with prediction.

Unlike DeepMind’s system, HTFERL doesn’t need experience replay, or any form of stochastic sampling. It is 100% online and real-time, and runs nicely at 60 fps with 400000 neurons on my R290. DeepMind’s system was limited to very few discrete actions (due to the poor complexity of “one hot” vectors), HTFERL can handle hundreds of thousands of continuous actions. HTFERL also incorporates eligibility traces, and has the ability to remember things indefinitely (DeepMind’s system was limited to a few seconds fixed history window). Furthermore, HTFERL has a feature known as temporal pooling, which allows it to group similar events into conceptual “bins” over time.

Since the system runs so fast, I was able to use the full-resolution video feed of the Atari games and able to stay capped at 60 fps. DeepMind had to crop and downsample.

So while it isn’t at DeepMind levels of game proficiency yet (but I think it will get there), this approach has some advantages.

I will push the ALE repository to GitHub soon. If you are just interested in HTFERL, the repository for it is available here: https://github.com/222464/HTFERL

If you want to experiment with HTFE (without the RL part), I recommend using the Python bindings here: https://github.com/222464/pyhtfe

Until next time!

Text2SDR

Hello!

While working on reinforcement learning with HTFERL, I tried using the algorithm for some other things as well. I tried using it for some natural language processing, with which I have no experience. So here is what happened…

I decided to start out with a single layer of HTFERL for predicting words ahead of time. The algorithm my brother (also interested in NLP!) and I came up with works like this:

If the word has never been seen before (not part of some dictionary D), create a new entry in D for this word, and assign it the current predicted word vector as the feature.
If the word has been seen before (it already exists in D), update the prediction to match the feature vector of this word.

So the layer of HTFERL goes through the sentence, word by word (or some other tokenizing method), and automatically starts assigning word vectors (features) to words it doesn’t know while keeping predictions up-to-date on words it does know.

This may seem very similar to word2vec, that’s because it is. The features generated by this process describe words by their grammatical properties, without actually knowing what the words mean. Just like word2vec, similar word vectors are similar in meaning, and just like word2vec it is possible to perform arithmetic on the word vectors.

So what makes this special compared to word2vec? Well, the word vectors are really only a side-effect of the technique. The interesting part is when we start using the system to understand sentences.

As the HTFERL layer parses the text, it builds an internal sparse distributed representation (SDR) of the text as whole. It learns whatever is necessary to predict the next word in the sentence, so we can be sure that the SDR contains very complete information about the meaning of the sentence.

From here we can either use the SDRs HTFERL generates as input to a classifier or some other system. Alternatively, it is also possible to use the text predictions for something useful stand-alone.

I have developed and tested a system that predicts what you are about to type based on the current typing pattern and the history of what you typed. I am currently developing a Visual Studio plugin that uses this system as a form of smart code completion.

Another interesting test I did is using the system for sentence generation. If you feed the predicted word back in to the system as input, then it will start generating a sentence. If you perturb the predictions a bit, it will start using different words with similar meanings, and form “random” but still grammatically valid sentences.

The code for Word2SDR (although really it is “text2SDR” 😉 ) is available here: https://github.com/222464/AILib/blob/master/Source/text/Word2SDR.h

So there you have it, using a HTM-derivative for NLP!

Until next time!