MSDN Magazine, September 2017

Page 62 - MSDN Magazine, September 2017

P. 62

training data is generated, the demo creates a new 4-(10,10,10)-3 DNN and trains it using the back-propagation algorithm. During training, the current mean squared error and classification accu- racy are displayed every 200 iterations.
The error slowly decreases and the accuracy slowly increases, as you’d expect. After training completes, the final accuracy of the DNN model is 93.45 percent, which means that 0.9345 * 2000 = 1869 items were correctly classified and therefore 131 items were incorrectly classified. The demo code that generates the output
begins with:
using System; namespace DeepNetTrain
{
class DeepNetTrainProgram {
The Train method uses the back-propagation algorithm to find values for the weights and biases so that the difference between computed output values and correct output values is minimized. The values of both the weights and biases are returned by Train. The argument of 10 passed to Train means to display progress messages every 2,000 / 10 = 200 iterations. It’s important to mon- itor progress because bad things can, and often do, happen when training a neural network.
After training completes, the final error and accuracy of the model are calculated and displayed using the final weights and bias values, which are still inside the DNN. The weights and biases could have been explicitly reloaded by executing the statement dnn.SetWeights(wts), but it’s not necessary in this case. The “false” arguments passed to methods Error and Accuracy mean to not display diagnostic messages.
Deep Neural Network Gradients and Weights
Each weight and bias in a DNN has an associated gradient value. A gradient is a calculus derivative of the error function and is just a value, such as -1.53, where the sign of the gradient tells you if the associated weight or bias should be increased or decreased to reduce error, and the magnitude of the gradient is proportional to how much the weight or bias should change. For example, suppose one of the weights, w, in a DNN has a value of +4.36, and after a training item is processed, the gradient for the weight, g, is calculated to be +2.50. If the learning rate, lr, is set to 0.10 then the new weight value is:
w=w+(lr*g)
= 4.36 + (0.10 * 2.50) = 4.36 + 0.25
= 4.61
So, training a DNN really boils down to finding the gradients for each weight and bias value. As it turns out, calculating the gradi- ents for the weights connecting the last hidden layer nodes to the output layer nodes, and the gradients for the output node biases is relatively easy even though the underlying math is extraordinarily profound. Expressed in code, the first step is to compute what’s called the output node signals for each output node:
for (int k = 0; k < nOutput; ++k) { errorSignal = tValues[k] - oNodes[k]; derivative = (1 - oNodes[k]) * oNodes[k]; oSignals[k] = errorSignal * derivative;
}
Local variable errorSignal is the difference between the target value (the correct node value from the training data) and the computed output node value. The details can be very tricky. For example, the demo code uses (target - output) but some references use (output - target), which affects whether the associated weight update statement should add or subtract when modifying weights.
Local variable derivative is a calculus derivative (not the same as the gradient, which is also a derivative) of the output activation function, which in this case is the softmax function. In other words, if you use something other than softmax, you’ll have to modify the calculation of the derivative local variable.
After the output node signals have been computed, they can be used to compute the gradients for the hidden-to-output weights: Test Run
...
static void Main(string[] args) { Console.WriteLine("Begin deep net demo"); int numInput = 4;
int[] numHidden = new int[] { 10, 10, 10 }; int numOutput = 3;
The demo program uses only plain C# with no namespaces except for System. First, the DNN to generate the simulated training data is prepared. The number of hidden layers, 3, is passed implicitly as the number of items in the numHidden array. An alternative design is to pass the number of hidden layers explicitly. Next, the training
data is generated using helper method MakeData:
int numDataItems = 2000; Console.WriteLine("Generating " + numDataItems +
" artificial training data items "); double[][] trainData = MakeData(numDataItems,
numInput, numHidden, numOutput, 5); Console.WriteLine("Done. Training data is: "); ShowMatrix(trainData, 3, 2, true);
The 5 passed to MakeData is a seed value for a random object so that demo runs will be reproducible. The value of 5 was used only because it gave a nice demo. The call to helper ShowMatrix displays the first 3 rows and the last row of the generated data, with 2 decimal places, show-
ing indices (true). Next, the DNN is created and training is prepared:
Console.WriteLine("Creating a 4-(10,10,10)-3 DNN"); DeepNet dn = new DeepNet(numInput, numHidden, numOutput); int maxEpochs = 2000;
double learnRate = 0.001;
double momentum = 0.01;
The demo uses a program-defined DeepNet class. The back-prop- agation algorithm is iterative so a maximum number of iterations, 2,000 in this case, must be specified. The learning rate parameter controls how much the weights and bias values are adjusted each time a training item is processed. A small learning rate could result in training being too slow (hours, days or more) but a large learning rate could lead to wildly oscillating results that never sta- bilize. Picking a good learning rate is a matter of trial and error and is a major challenge when working with DNNs. The momentum factor is somewhat like an auxiliary learning rate, and typically speeds up training when a small learning rate is used.
The demo program calling code concludes with:
...
double[] wts = dn.Train(trainData, maxEpochs,
learnRate, momentum, 10); Console.WriteLine("Training complete");
double trainError = dn.Error(trainData, false); double trainAcc = dn.Accuracy(trainData, false); Console.WriteLine("Final model MS error = " +
trainError.ToString("F4")); Console.WriteLine("Final model accuracy = " +
trainAcc.ToString("F4")); Console.WriteLine("End demo ");
54 msdn magazine
}

60 61 62 63 64