Continuous Hierarchical Temporal Memory Temporal Inference

Hello again! Time for part 2 of my description of the latest CHTM! In this post I will discuss the temporal inference portion of the algorithm. Continuing from where I left off in the last post, we now have a sparse distributed representation of a layer’s input (represented by columns with high state values). Remember that I will use the notation _(t-1) to indicate values from the previous timestep. We now want to activate the cells within the column:

void cellActivate(Layer l, float cellIntensity) {
    foreach (Column col in l.columns) {
        float minPredictionError = 1;

        foreach (Cell cell in col.cells) {
            float predictionError = abs(col.state - cell.prediction_(t-1));

            minPredictionError = min(minPredictionError, predictionError);
        }

        foreach (Cell cell in col.cells) {
            float predictionError = abs(col.state - cell.prediction_(t-1));
    
            cell.state = exp((minPredictionError - predictionError) * cellIntensity) * col.state;
        }
    }
}

Here we are running a competitive process among the cells to activate the one that predicted the column the best and deactivate the others. This process forms a context for future predictions, whose predictions will then again be used to form new contexts, and so on. Next, we can form new predictions for each cell:

void cellPredict(Layer l, float predictionIntensity, Radius cellRadius) {
    foreach (Column col in l.columns) {
        foreach (Cell cell in col.cells) {
            float sum = 0;

            foreach (Connection con in cell.lateralConnections)
                sum += con.weight * cells[con.connectionIndex].state;

            cell.prediction = max(0, sigmoid(sum * predictionIntensity) * 2 - 1);
        }
    }
}

Here we are treating each cell as a perceptron with connections to all cells within a radius (including cells in the same column as well as itself, but this is optional). The activation function is a sigmoid scaled into the [-1, 1] range, and then clamped to always be positive. This change allows us to not include a bias unit and still get accurate predictions. From these cell predictions we derive a prediction for the entire column:

void columnPredict(Layer l) {
    foreach (Column col in l.columns) {
        float maxPrediction = 0;

        foreach (Cell cell in col.cells)
            maxPrediction = max(maxPrediction, cell.prediction);

        col.prediction = maxPrediction;
    }
}

The output of a column is simply the maximum cell prediction. Finally, we update the cell weights using a simple perceptron learning rule:

void learnCells(Layer l, float learningRate) {
    foreach (Column col in l.columns) {
        float error = learningRate * (col.state - col.prediction_(t-1));

        foreach (Connection con in cell.lateralConnections)
            con.weight += error * cells[con.connectionIndex].state_(t-1);
    }
}

The error for all cells in a column is the same: It is the difference between what we predicted for this column the last timestep and what we actually got. That’s the basic algorithm for a single layer of CHTM! In the next post I will discuss how to use multiple layers to make more accurate predictions! Until then!