Page 72 - MSDN Magazine, November 2017
P. 72
z = (0.3679)(-0.3) + (0.0002)(0.4) + (0.0183)(-0.2) + (0.0015) (0.6) + 0.1
= -0.1120
Now you compute p = 1.0 / (1.0 + exp(-z)):
p = 1.0 / (1.0 + exp(0.1120)) = 0.4720
If the p value is greater than 0.5, the predicted class is 1, and if the p value is less than 0.5, the predicted class is 0, as it is (just barely) for this example.
Training a KLR Model
Training a KLR model is the process of using training data to find
the alpha values and the bias value. Expressed in very high-level
pseudo-code, the KLR training algorithm is:
compute K(td[i], td[j]) for all i, j loop maxIter times
for-each curr training item, i for-each j: sum += alphas[j] * K(i j) sum += bias
y = 1.0 / (1.0 + exp(-sum))
t = target class (0 or 1)
for-each j:
alpha[j] += eta * (t - y) * K(i, j) bias += eta * (t - y) * 1.0
end-loop
The key statement in the demo code is alphas[j] += eta * (t - y) * kernelMatrix[i][j], which updates the alpha value for the training data item at index [j] based on the current training data item at index [i]. Here, t is the known, correct target class, 0 or 1, and y is a calculated probability that the item at [i] has class 1.
The Shuffle method is a helper that scrambles the order of the training items using the Fisher-Yates mini-algorithm.
For example, suppose an alpha value for a training item is cur- rently 0.1234 and the target class is 1 and the computed probability is 0.60. The current prediction would be correct, but you’d like the p value to be even closer to 1. Suppose the similarity between the two items is K(i, j) = 0.70 and the learning rate eta is 0.10. The new alpha value would be:
alpha = 0.1234 + 0.10 * (1 - 0.60) * 0.70 = 0.1234 + 0.0280
= 0.1514
Because alpha is a multiplier value in the probability calculation,
the new, slightly larger value of alpha will increase p a little bit, making the prediction more accurate.
The Demo Program
To code the demo program, I launched Visual Studio and created a new C# console application program and named it KernelLogistic. I used Visual Studio 2015, but the demo program has no signif- icant .NET Framework dependencies, so any recent version of Visual Studio will work.
After the template code loaded into the editor window, I right- clicked on file Program.cs in the Solution Explorer window, renamed the file to KernelLogisticProgram.cs and then allowed Visual Studio to automatically rename class Program for me. At the top of the template-generated code, I deleted all unnecessary using statements leaving just the one that references the top-level System namespace. Then I instantiated a Random object:
using System;
namespace KernelLogistic {
class KernelLogisticProgram {
static Random rnd = new Random(0); static void Main(string[] args)
{
Console.WriteLine(“Begin KLR demo”); int numFeatures = 2;
For simplicity, I coded the demo using a static method approach rather than object-oriented programming, and removed all nor- mal error checking. The Main method sets up the 21 training items and the 4 test items like so:
double[][] trainData = new double[21][]; trainData[0] = new double[] { 2.0, 3.0, 0 }; ...
trainData[20] = new double[] { 5.0, 6.0, 1 }; double[][] testData = new double[4][]; testData[0] = new double[] { 1.5, 4.5, 0 }; ...
testData[3] = new double[] { 5.5, 5.5, 1 };
In a non-demo scenario, you’d likely read data from a text file. Next, the alpha values are initialized:
int numTrain = trainData.Length;
int numTest = testData.Length;
double[] alphas = new double[numTrain + 1]; for (int i = 0; i < alphas.Length; ++i)
alphas[i] = 0.0;
When coding machine learning systems, there are usually several ways to deal with bias values. Here, I store the KLR bias in the last cell of the alphas array. An alternative design is to create a separate standalone variable. Next, the kernel similarities between all pairs of training items are computed:
double[][] kernelMatrix = new double[numTrain][]; for (int i = 0; i < kernelMatrix.Length; ++i)
kernelMatrix[i] = new double[numTrain]; double sigma = 1.0;
for (int i = 0; i < numTrain; ++i) {
for (int j = 0; j < numTrain; ++j) {
double k = Kernel(trainData[i], trainData[j], sigma); kernelMatrix[i][j] = kernelMatrix[j][i] = k;
}
Because there are only 21 data items, I sacrifice efficiency for simplicity. I could’ve reduced the number of kernel calculations by using the facts that K(v1, v2) = K(v2, v1) and K(v, v) = 1. Next, the demo program prepares for training:
double eta = 0.001;
int iter = 0;
int maxIter = 1000;
int[] indices = new int[numTrain];
for (int i = 0; i < indices.Length; ++i)
indices[i] = i;
The values of eta and maxIter were determined by trial and error. The idea behind the array named indices is that when train- ing, it’s important to visit the training items in a random order on each pass, to avoiding getting into a situation where training stalls or oscillates back and forth. The main training loop begins:
68 msdn magazine
Test Run
...
}