Hello again! Time for part 2 of my description of the latest CHTM! In this post I will discuss the temporal inference portion of the algorithm. Continuing from where I left off in the last post, we now have a sparse distributed representation of a layer’s input (represented by columns with high state values). Remember that I will use the notation _(t-1) to indicate values from the previous timestep. We now want to activate the cells within the column:
void cellActivate(Layer l, float cellIntensity) { foreach (Column col in l.columns) { float minPredictionError = 1; foreach (Cell cell in col.cells) { float predictionError = abs(col.state - cell.prediction_(t-1)); minPredictionError = min(minPredictionError, predictionError); } foreach (Cell cell in col.cells) { float predictionError = abs(col.state - cell.prediction_(t-1)); cell.state = exp((minPredictionError - predictionError) * cellIntensity) * col.state; } } }
Here we are running a competitive process among the cells to activate the one that predicted the column the best and deactivate the others. This process forms a context for future predictions, whose predictions will then again be used to form new contexts, and so on. Next, we can form new predictions for each cell:
void cellPredict(Layer l, float predictionIntensity, Radius cellRadius) { foreach (Column col in l.columns) { foreach (Cell cell in col.cells) { float sum = 0; foreach (Connection con in cell.lateralConnections) sum += con.weight * cells[con.connectionIndex].state; cell.prediction = max(0, sigmoid(sum * predictionIntensity) * 2 - 1); } } }
Here we are treating each cell as a perceptron with connections to all cells within a radius (including cells in the same column as well as itself, but this is optional). The activation function is a sigmoid scaled into the [-1, 1] range, and then clamped to always be positive. This change allows us to not include a bias unit and still get accurate predictions. From these cell predictions we derive a prediction for the entire column:
void columnPredict(Layer l) { foreach (Column col in l.columns) { float maxPrediction = 0; foreach (Cell cell in col.cells) maxPrediction = max(maxPrediction, cell.prediction); col.prediction = maxPrediction; } }
The output of a column is simply the maximum cell prediction. Finally, we update the cell weights using a simple perceptron learning rule:
void learnCells(Layer l, float learningRate) { foreach (Column col in l.columns) { float error = learningRate * (col.state - col.prediction_(t-1)); foreach (Connection con in cell.lateralConnections) con.weight += error * cells[con.connectionIndex].state_(t-1); } }
The error for all cells in a column is the same: It is the difference between what we predicted for this column the last timestep and what we actually got. That’s the basic algorithm for a single layer of CHTM! In the next post I will discuss how to use multiple layers to make more accurate predictions! Until then!
Hmm… a link to the ‘last post’ would be good 😉
LikeLike