Continuous Hierarchical Temporal Memory Temporal Inference

Hello again! Time for part 2 of my description of the latest CHTM! In this post I will discuss the temporal inference portion of the algorithm. Continuing from where I left off in the last post, we now have a sparse distributed representation of a layer’s input (represented by columns with high state values). Remember that I will use the notation _(t-1) to indicate values from the previous timestep. We now want to activate the cells within the column:

void cellActivate(Layer l, float cellIntensity) {
    foreach (Column col in l.columns) {
        float minPredictionError = 1;

        foreach (Cell cell in col.cells) {
            float predictionError = abs(col.state - cell.prediction_(t-1));

            minPredictionError = min(minPredictionError, predictionError);
        }

        foreach (Cell cell in col.cells) {
            float predictionError = abs(col.state - cell.prediction_(t-1));
    
            cell.state = exp((minPredictionError - predictionError) * cellIntensity) * col.state;
        }
    }
}

Here we are running a competitive process among the cells to activate the one that predicted the column the best and deactivate the others. This process forms a context for future predictions, whose predictions will then again be used to form new contexts, and so on. Next, we can form new predictions for each cell:

void cellPredict(Layer l, float predictionIntensity, Radius cellRadius) {
    foreach (Column col in l.columns) {
        foreach (Cell cell in col.cells) {
            float sum = 0;

            foreach (Connection con in cell.lateralConnections)
                sum += con.weight * cells[con.connectionIndex].state;

            cell.prediction = max(0, sigmoid(sum * predictionIntensity) * 2 - 1);
        }
    }
}

Here we are treating each cell as a perceptron with connections to all cells within a radius (including cells in the same column as well as itself, but this is optional). The activation function is a sigmoid scaled into the [-1, 1] range, and then clamped to always be positive. This change allows us to not include a bias unit and still get accurate predictions. From these cell predictions we derive a prediction for the entire column:

void columnPredict(Layer l) {
    foreach (Column col in l.columns) {
        float maxPrediction = 0;

        foreach (Cell cell in col.cells)
            maxPrediction = max(maxPrediction, cell.prediction);

        col.prediction = maxPrediction;
    }
}

The output of a column is simply the maximum cell prediction. Finally, we update the cell weights using a simple perceptron learning rule:

void learnCells(Layer l, float learningRate) {
    foreach (Column col in l.columns) {
        float error = learningRate * (col.state - col.prediction_(t-1));

        foreach (Connection con in cell.lateralConnections)
            con.weight += error * cells[con.connectionIndex].state_(t-1);
    }
}

The error for all cells in a column is the same: It is the difference between what we predicted for this column the last timestep and what we actually got. That’s the basic algorithm for a single layer of CHTM! In the next post I will discuss how to use multiple layers to make more accurate predictions! Until then!

Continuous Hierarchical Temporal Memory (CHTM) Update

Hello,

It’s been a while since I have posted here again, but I have been working on my CHTM┬áreinforcement learner all along! I got it to perform pole balancing based on vision data of the pole alone recently, but only for a few seconds at a time. But that is a topic for another post. In this post, I want to discuss the changes and improvements that have occurred in CHTM – land.

For those unfamiliar, CHTM (continuous hierarchical temporal memory) is a generalization/simplification of the standard HTM developed by Numenta that uses familiar concepts from deep learning to reproduce the spatial pooling and temporal inference capabilities of standard HTM. The main change is that instead of using binary values for everything, we now use real-valued numbers for everything. This makes it simpler to code, and also provides some extra capabilities that the original did not have.

First let’s lay down the data structures we will be using in pseudocode.

struct Layer {
    Column[] columns;
};

struct Column {
    float activation;
    float state;
    float usage;

    Cell[] cells;

    Connection[] feedforwardConnections;
};

struct Cell {
    float prediction;
    float state;

    Connection[] lateralConnections;
};

struct Connection {
    float weight;
    float trace;

    int sourceIndex;
};

A layer consists of a 2D grid of columns. Each column has feedforward connections to an input source, taking a subsection of the input source (within a radius of the column). Each column also has a number of cells in it, which have lateral connections to other cells in the same layer within a radius.

CHTM, like the original, has two parts to it: Spatial pooling and temporal inference. Let’s start with spatial pooling. I will use the notation _(t-1) to describe values from a previous timestep.

In spatial pooling, when want to learn sparse distributed representations of the input such that the representations have minimal overlap. To do this, we go through each column (possibly in parallel) and calculate the activation value of each column:

void calculateActivations(Layer l, float[] input) {
    foreach (Column col in l.columns) {
        float sum = 0;
        
        foreach (Connection con in col.feedforwardConnections) {
            float difference = con.weight - input[con.sourceIndex];

            sum += difference * difference;
        }

        col.activation = -sum;
    }
}

Essentially what we are doing here is getting the negative distance between the input region and the feedforward connections vector. So the closer the column’s feedforward connections are to the input, the higher its activation value will be, capping out at 0.

The next step is the inhibition step:

void inhibit(Layer l, Radius inhibitionRadius, float localActivity, float stateIntensity, float usageDecay) {
    foreach (Column col in l.columns) {
        int numHigher = 0;     

        foreach (Column competitor in inhibitionRadius) {
            if (competitor.activation > col.activation)
                numHigher++;
        }

        col.state = sigmoid((localActivity - numHigher) * stateIntensity);

        col.usage = (1 - usageDecay) * (1 - col.state) * col.usage + col.state;
    }
}

Here column states are set such that they are a sparsified version of the activation values. Along with that we update a usage parameter, which is a value that goes to 1 when a column’s state is 1 and decays when it is below 1. This way we can keep track of which columns are underutilized.

Next comes the learning portion of the spatial pooler:

void learnColumns(Layer l, float learningRate, float boostThreshold) {
    foreach (Column col in l.columns) {
        float learnScalar = learningRate * min(1, max(0, boostThreshold - col.usage) / boostThreshold);

        foreach (Connection con in col.feedforwardConnections)
            con.weight += learnScalar * (input[con.sourceIndex] - con.weight);
    }
}

Here we move underutilized columns towards the current input vector. This way we maximize the amount of information the columns can represent.

Next up: The temporal inference component!