Hello!

This post is indirectly related to my series of posts on Deepmind-style Atari playing. It is about an idea that I wanted to share for building massive reinforcement learners.

While working on my CHTM implementation, I started to think of ways that it could be simplified. Perhaps to the point where it is no longer biologically plausible even, but while still keeping the functionality intact.

The closest thing I can think of in standard deep learning literature that can do similar things to HTM is the sparse recurrent autoencoder. HTM makes predictions about its own input, similar to a recurrent autoencoder. The sparsity comes in handy to minimize forgetting.

So I started experimenting with some autoencoders, one of which made its way into the spatial pooler of CHTM. Not many papers exist on sparse autoencoders with explicit lateral inhibition, so I played around with some ideas and came up with the following code (Python):

import numpy as np def sigmoid(x): return 1.0 / (1.0 + np.exp(-x)) class SparseAutoencoder(object): """Sparse Autoencoder with Explicit Lateral Inhibition""" weights = np.matrix([[]]) activations = np.array([]) dutyCycles = np.array([]) def __init__(self, numInputs, sdrSize, sparsity, minWeight, maxWeight): self.weights = np.random.rand(sdrSize, numInputs) * (maxWeight - minWeight) + minWeight self.visibleBiases = np.random.rand(numInputs) * (maxWeight - minWeight) + minWeight self.hiddenBiases = np.random.rand(sdrSize) * (maxWeight - minWeight) + minWeight self.activations = np.zeros_like(self.hiddenBiases) self.dutyCycles = np.array([sparsity for i in range(0, sdrSize)]) def generate(self, inputs, sparsity, dutyCycleDecay): localActivity = np.round(len(self.hiddenBiases) * sparsity) # Activation sums = np.dot(self.weights, inputs.T) + self.hiddenBiases.T self.activations = np.array([sigmoid(sums[i]) for i in range(0, len(sums))]) sdr = np.array([0.0 for i in range(0, len(self.activations))]); for i in range(0, len(self.activations)): numHigher = 0.0 for j in range(0, len(self.activations)): if sums[j] &amp;gt; sums[i]: numHigher += 1.0 if numHigher &amp;lt; localActivity: sdr[i] = 1.0 self.dutyCycles = (1.0 - dutyCycleDecay) * self.dutyCycles + dutyCycleDecay * sdr return sdr def learn(self, inputs, sdr, sparsity, alpha, beta): # Reconstruct recon = self.reconstruct(sdr) errors = inputs - recon hiddenErrors = np.dot(self.weights, errors) / len(errors) * self.activations * (1.0 - self.activations) self.weights += alpha * 0.5 * (np.dot(np.matrix(errors).T, np.matrix(sdr)).T + np.dot(np.matrix(hiddenErrors).T, np.matrix(inputs)))# + beta * np.dot(np.matrix((np.array([sparsity for i in range(0, len(self.dutyCycles))]) - self.dutyCycles)).T, np.matrix(inputs)) self.visibleBiases += alpha * errors self.hiddenBiases += beta * (np.array([sparsity for i in range(0, len(self.dutyCycles))]) - self.dutyCycles) return recon def reconstruct(self, sdr): return np.dot(self.weights.T, sdr.T) + self.visibleBiases.T

This autoencoder has been tested on some image reconstruction tasks, where it functions properly.

Now we essentially have a simplified spatial pooler. Make it not-fully-connected, and you can make large layers of it for local SDRs. But how can we encode temporal information as well, such that we instead of reconstructing the current input we reconstruct the input at _t+1_?

One possible solution, which is mostly still just an idea at the moment, is to make recurrent connections at the SDR level. That is, connections between hidden nodes of the autoencoder. Hopefully this will allow us to predict one step ahead of time.

So assuming that that works as intended, how can we scale this up? Naturally we would want to have some sort of layering architecture. We can stack these autoencoders as is standard in deep learning. But how do we make predictions from the autoencoders take information from the above layers into account? In the end we want to predict the input at the lowest layer only.

My proposal is to add recurrent connections to the next layer as well. So first we do an upwards pass on the input so that the hidden representations (SDRs) are formed, and then we go back down and start reconstructing the SDRs using information from previous layers.

If this plan works out, we would have a highly scalable way of predicting a sequence one step ahead of time. From that point, we can apply reinforcement learning in a way similar to CHTM, by only learning to predict when the temporal difference errors is positive. Q values can be stored as a sort of free energy similarly to the way CHTM does it as well.

If all this comes to pass, then the result would have temporal pooling, since the SDRs for spatial and temporal data are shared. This should allow the prediction of very large sequences efficiently.

I am excited to code this concept, here is the repository I have started working on: HTFERL

HTFERL stands for hierarchical temporal free-energy reinforcement learner.

If you are interested in joining this project, let me know! I could use a hand đź™‚

Until next time!