I finally got my HTFERL agent churning at the Atari game “Breakout”. It isn’t at DeepMind levels of proficiency yet, but it is much better than random 😉
I trained it for about 15 minutes, and it quite obviously follows the ball around. It usually gets the first and second bounce after these 15 minutes of training. Before I run it overnight, though, I want to still do some performance improvements to the system so that I don’t have to wait so long.
Picture of it playing Breakout after 15 minutes:
I would have made a video, but since the ALE uses software rendering with SDL I cannot use Fraps on it. I will look for some other recording method for a video soon though.
At this point this is more a proof of concept, but I am sure that eventually I will get somewhere within DeepMind territory of game proficiency. The system has come a long way, and I am quite happy with where it is going right now.
For those who haven’t been following this blog, here is a quick summary of how the system works and what does differently from DeepMind’s system.
HTFERL (hierarchical temporal free energy reinforcement learner) is a derivative of HTM (hierarchical temporal memory). It tries to mimic the algorithmic properties of the human neocortex. It can be thought of as a stack of predictive k-sparse autoencoders with recurrent lateral connections and feed back connections (top-down). Information doesn’t only flow upwards, but also can flow in the reverse direction if it helps with prediction.
Unlike DeepMind’s system, HTFERL doesn’t need experience replay, or any form of stochastic sampling. It is 100% online and real-time, and runs nicely at 60 fps with 400000 neurons on my R290. DeepMind’s system was limited to very few discrete actions (due to the poor complexity of “one hot” vectors), HTFERL can handle hundreds of thousands of continuous actions. HTFERL also incorporates eligibility traces, and has the ability to remember things indefinitely (DeepMind’s system was limited to a few seconds fixed history window). Furthermore, HTFERL has a feature known as temporal pooling, which allows it to group similar events into conceptual “bins” over time.
Since the system runs so fast, I was able to use the full-resolution video feed of the Atari games and able to stay capped at 60 fps. DeepMind had to crop and downsample.
So while it isn’t at DeepMind levels of game proficiency yet (but I think it will get there), this approach has some advantages.
I will push the ALE repository to GitHub soon. If you are just interested in HTFERL, the repository for it is available here: https://github.com/222464/HTFERL
If you want to experiment with HTFE (without the RL part), I recommend using the Python bindings here: https://github.com/222464/pyhtfe
Until next time!