Hello again! Time for part 2 of my description of the latest CHTM! In this post I will discuss the temporal inference portion of the algorithm. Continuing from where I left off in the last post, we now have a sparse distributed representation of a layer’s input (represented by columns with high state values). Remember that I will use the notation _(t-1) to indicate values from the previous timestep. We now want to activate the cells within the column:
void cellActivate(Layer l, float cellIntensity) {
foreach (Column col in l.columns) {
float minPredictionError = 1;
foreach (Cell cell in col.cells) {
float predictionError = abs(col.state - cell.prediction_(t-1));
minPredictionError = min(minPredictionError, predictionError);
}
foreach (Cell cell in col.cells) {
float predictionError = abs(col.state - cell.prediction_(t-1));
cell.state = exp((minPredictionError - predictionError) * cellIntensity) * col.state;
}
}
}
Here we are running a competitive process among the cells to activate the one that predicted the column the best and deactivate the others. This process forms a context for future predictions, whose predictions will then again be used to form new contexts, and so on. Next, we can form new predictions for each cell:
void cellPredict(Layer l, float predictionIntensity, Radius cellRadius) {
foreach (Column col in l.columns) {
foreach (Cell cell in col.cells) {
float sum = 0;
foreach (Connection con in cell.lateralConnections)
sum += con.weight * cells[con.connectionIndex].state;
cell.prediction = max(0, sigmoid(sum * predictionIntensity) * 2 - 1);
}
}
}
Here we are treating each cell as a perceptron with connections to all cells within a radius (including cells in the same column as well as itself, but this is optional). The activation function is a sigmoid scaled into the [-1, 1] range, and then clamped to always be positive. This change allows us to not include a bias unit and still get accurate predictions. From these cell predictions we derive a prediction for the entire column:
void columnPredict(Layer l) {
foreach (Column col in l.columns) {
float maxPrediction = 0;
foreach (Cell cell in col.cells)
maxPrediction = max(maxPrediction, cell.prediction);
col.prediction = maxPrediction;
}
}
The output of a column is simply the maximum cell prediction. Finally, we update the cell weights using a simple perceptron learning rule:
void learnCells(Layer l, float learningRate) {
foreach (Column col in l.columns) {
float error = learningRate * (col.state - col.prediction_(t-1));
foreach (Connection con in cell.lateralConnections)
con.weight += error * cells[con.connectionIndex].state_(t-1);
}
}
The error for all cells in a column is the same: It is the difference between what we predicted for this column the last timestep and what we actually got. That’s the basic algorithm for a single layer of CHTM! In the next post I will discuss how to use multiple layers to make more accurate predictions! Until then!