My Attempt at Outperforming Deepmind’s Atari Results – UPDATE 8

Hello again!

I recently found an interesting paper (http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0029264). It describes a method for applying MPF (a superset of HTM, Memory Prediction Frameworks) to reinforcement learning, and I really like the idea behind it so far.

From what I gather, the idea is to do an upward pass on the MPF to form increasingly invariant and abstracted forms of the current and past inputs. One then descends down again and combines information from the upper levels with the more fine-grained information at the lower levels. While doing this, one can perturb predictions such that the system is more likely to predict inputs (consisting of state and action) that lead to higher rewards.

The authors actually mentioned that they also wanted to use it for game playing, which is my goal. Since then I have been very excited at the prospect of using MPF and MPF alone to do the reinforcement learning, and have started extending my existing GPU-HTM to include temporal pooling as well as the backwards pass and action-generating perturbations.

Possible benefits of this approach over using MPF solely as a feature extractor:

Features at each level in the MPF hierarchy can be used, and combined with features from previous higher layers on a backwards pass to add additional context to actions.
No need for experience replay, which is slow and inelegant – MPF is designed to work online from the beginning.
Can be run easily on the GPU, using images (textures) to store weights, activations.
Partial observability is easier to handle.

Possible negatives of this approach:

Less material to work with – less papers. If something goes wrong, I am unlikely to find too much help.
Input encoding may be difficult. However, since my version of HTM operates using continuous cell and column states, this might not be as difficult as it first seems.

I have already written a GPU temporal pooler with one-step prediction, which should be sufficient for now. As with my spatial pooler, it operates on continuous cell/column states, unlike Numenta’s HTM.

I leave you with an image of the cells in an HTM region responding to a test image (rendered with VolView):