A new idea: HTFERL

Hello!

This post is indirectly related to my series of posts on Deepmind-style Atari playing. It is about an idea that I wanted to share for building massive reinforcement learners.

While working on my CHTM implementation, I started to think of ways that it could be simplified. Perhaps to the point where it is no longer biologically plausible even, but while still keeping the functionality intact.

The closest thing I can think of in standard deep learning literature that can do similar things to HTM is the sparse recurrent autoencoder. HTM makes predictions about its own input, similar to a recurrent autoencoder. The sparsity comes in handy to minimize forgetting.

So I started experimenting with some autoencoders, one of which made its way into the spatial pooler of CHTM. Not many papers exist on sparse autoencoders with explicit lateral inhibition, so I played around with some ideas and came up with the following code (Python):

import numpy as np

def sigmoid(x):
    return 1.0 / (1.0 + np.exp(-x))

class SparseAutoencoder(object):
    """Sparse Autoencoder with Explicit Lateral Inhibition"""

    weights = np.matrix([[]])
    activations = np.array([])
    dutyCycles = np.array([])

    def __init__(self, numInputs, sdrSize, sparsity, minWeight, maxWeight):
        self.weights = np.random.rand(sdrSize, numInputs) * (maxWeight - minWeight) + minWeight
        self.visibleBiases = np.random.rand(numInputs) * (maxWeight - minWeight) + minWeight
        self.hiddenBiases = np.random.rand(sdrSize) * (maxWeight - minWeight) + minWeight
        self.activations = np.zeros_like(self.hiddenBiases)
        self.dutyCycles = np.array([sparsity for i in range(0, sdrSize)])

    def generate(self, inputs, sparsity, dutyCycleDecay):
        localActivity = np.round(len(self.hiddenBiases) * sparsity)

        # Activation
        sums = np.dot(self.weights, inputs.T) + self.hiddenBiases.T

        self.activations = np.array([sigmoid(sums[i]) for i in range(0, len(sums))])

        sdr = np.array([0.0 for i in range(0, len(self.activations))]);

        for i in range(0, len(self.activations)):
            numHigher = 0.0

            for j in range(0, len(self.activations)):
                if sums[j] > sums[i]:
                    numHigher += 1.0

            if numHigher < localActivity:
                sdr[i] = 1.0

        self.dutyCycles = (1.0 - dutyCycleDecay) * self.dutyCycles + dutyCycleDecay * sdr

        return sdr

    def learn(self, inputs, sdr, sparsity, alpha, beta):
        # Reconstruct
        recon = self.reconstruct(sdr)

        errors = inputs - recon

        hiddenErrors = np.dot(self.weights, errors) / len(errors) * self.activations * (1.0 - self.activations)

        self.weights += alpha * 0.5 * (np.dot(np.matrix(errors).T, np.matrix(sdr)).T + np.dot(np.matrix(hiddenErrors).T, np.matrix(inputs)))# + beta * np.dot(np.matrix((np.array([sparsity for i in range(0, len(self.dutyCycles))]) - self.dutyCycles)).T, np.matrix(inputs))

        self.visibleBiases += alpha * errors
        
        self.hiddenBiases += beta * (np.array([sparsity for i in range(0, len(self.dutyCycles))]) - self.dutyCycles)

        return recon

    def reconstruct(self, sdr):
        return np.dot(self.weights.T, sdr.T) + self.visibleBiases.T

This autoencoder has been tested on some image reconstruction tasks, where it functions properly.

Now we essentially have a simplified spatial pooler. Make it not-fully-connected, and you can make large layers of it for local SDRs. But how can we encode temporal information as well, such that we instead of reconstructing the current input we reconstruct the input at _t+1_?

One possible solution, which is mostly still just an idea at the moment, is to make recurrent connections at the SDR level. That is, connections between hidden nodes of the autoencoder. Hopefully this will allow us to predict one step ahead of time.

So assuming that that works as intended, how can we scale this up? Naturally we would want to have some sort of layering architecture. We can stack these autoencoders as is standard in deep learning. But how do we make predictions from the autoencoders take information from the above layers into account? In the end we want to predict the input at the lowest layer only.

My proposal is to add recurrent connections to the next layer as well. So first we do an upwards pass on the input so that the hidden representations (SDRs) are formed, and then we go back down and start reconstructing the SDRs using information from previous layers.

If this plan works out, we would have a highly scalable way of predicting a sequence one step ahead of time. From that point, we can apply reinforcement learning in a way similar to CHTM, by only learning to predict when the temporal difference errors is positive. Q values can be stored as a sort of free energy similarly to the way CHTM does it as well.

If all this comes to pass, then the result would have temporal pooling, since the SDRs for spatial and temporal data are shared. This should allow the prediction of very large sequences efficiently.

I am excited to code this concept, here is the repository I have started working on: HTFERL

HTFERL stands for hierarchical temporal free-energy reinforcement learner.

If you are interested in joining this project, let me know! I could use a hand 🙂

Until next time!

My Attempt at Outperforming Deepmind’s Atari Results – UPDATE 12

Hello!

It has been some time since I last posted here. Since my last post, many changes have occurred to my CHTM architecture.

First off, the reinforcement learning algorithm is now based on CACLA (continuous actor-critic learning automaton), but the actor and critic are part of the same structure. The temporal prediction system only learns to predict the last action when the temporal difference error is positive. Read more about CACLA here: http://webdocs.cs.ualberta.ca/~vanhasse/rl_algs/Cacla.html

Second, the Q values are now stored within the HTM cells. Since each cell *should* only be active in a unique state (taking partial observability into account), this means that each cell can effectively store the Q value for that state. This is both more biologically plausible (I guess…), and far more efficient. It can now be thought of as a “smart, self-adapting” look-up table.

Third, cells can now predict far more accurately with the addition of dendrite segments. In standard machine learning terms, this is a small perceptron with one hidden layer and an output layer where all weights are 1 (OR operation). This allows the cell to differentiate more complex patterns, limited only by the number of segments (hidden units).

Fourth, the spatial pooler has been reworked several times, and is now essentially a sparse autoencoder with explicit lateral inhibition. It still uses boosting to help out “dead” columns.

So, what about results? Well, I am not quite satisfied with the results yet, but they have been improving a lot. I am performing pole balancing with a twist: It uses vision data (64 x 32 pixels). I made a little plotter to plot my  reward values, and this is one of the more recent runs:

rewardIncreasing

The system now looks like this when visualized with the volumetric renderer:

nonuniform

You may notice that the SDRs are no longer uniform across the layers. The new spatial pooler leaves columns off when all the inputs are 0. I am mostly doing this for debugging purposes at the moment, since it makes it easier to see what is going on.

So that’s it for now, I will hopefully have more interesting results soon, perhaps with a video. I want to start giving updates regularly again as well!