Page 32 - MSDN Magazine, July 2017
P. 32
can print whatever information you wish inside the main training loop, the built-in ProgressPrinter object is a very convenient way to monitor training. Training is performed with these statements:
print("Starting training \
") for i in range(0, max_iter):
currBatch = reader_train.next_minibatch(batch_size, input_map = my_input_map)
trainer.train_minibatch(currBatch)
pp.update_with_trainer(trainer) print("\
Training complete")
In each training iteration, the next_minibatch function pulls a batch (5 in the demo) of training items, and uses SGD to update the current values of weights and biases.
Testing the Network
After a neural network has been trained, you should use the trained model on the holdout test data. The idea is that given enough train- ing time and combinations of learning rate and batch size, you can eventually get close to 100 percent accuracy on your training data. However, excessive training can over-fit and lead to a model that predicts very poorly on new data.
print("\
Evaluating test data \
")
reader_test = create_reader(test_file, False, input_dim,
output_dim)
numTestItems = 30
allTest = reader_test.next_minibatch(numTestItems,
input_map = my_input_map)
test_error = trainer.test_minibatch(allTest) print("Classification error on the 30 test items = %f"
% test_error)
The next_minibatch function examines all 30 test items at once. Notice that you can reuse the my_input_map object for the test data because the mapping to input_Var and label_Var is the same as to the training data.
Making Predictions
Ultimately, the purpose of a neural network model is to make predictions for new, previously unseen data.
unknown = np.array([[6.9, 3.1, 4.6, 1.3]], dtype=np.float32)
print("\
Predicting Iris species for features: ") my_print(unknown[0], 1) # 1 decimal
predicted = nnet.eval( {input_Var: unknown} ) print("Prediction is: ")
my_print(predicted[0], 3) # 3 decimals
The variable named unknown is an array-of-array-style numpy matrix, which is required by a CNTK neural network. The eval func- tion accepts input values, runs them through the trained model using the neural network input-output process and the resulting three probabilities (0.263, 0.682, 0.055) are displayed.
In some situations it’s useful to iterate through all test items and use the eval function to see exactly which items were incorrectly predicted. You can also write code that uses the numpy.argmax function to determine the largest value in the output probabilities and explicitly print “correct” or “wrong.”
Exporting Weights and Biases
The demo program concludes by fetching the trained model’s weights and biases, and then displays them to the shell, as well as saves them to a text file. The idea is that you can train a neural net- work using CNTK, then use the trained model weights and biases in another system, such as a C# program, to make predictions.
The weights and bias values for the hidden layer are displayed like this:
print("\
Trained model input-to-hidden weights: \
") print(hLayer.hidLayer.W.value)
print("\
Trained model hidden node biases: \
") print(hLayer.hidLayer.b.value)
Recall that a CNTK network layer is a named object (hLayer), but that an optional name property was passed in when the layer was created (hidLayer). The tersely named W property of a named layer returns an array-of-arrays-style matrix holding the input-to-hidden weights. Similarly, the b property gives you the biases. The weights and biases for the output layer are obtained in the same way:
print("\
Trained model hidden-to-output weights: \
") print(oLayer.outLayer.W.value)
print("\
Trained model output node biases: \
") print(oLayer.outLayer.b.value)
Thevaluesofthe(4*2)+(2*3)=14weights,andthe(2+3)=5 biases, are saved to text file, and function do_demo concludes, like so:
...
save_weights("weights.txt", hLayer.hidLayer.W.value, hLayer.hidLayer.b.value, oLayer.outLayer.W.value, oLayer.outLayer.b.value)
return 0 # success
The program-defined save_weights function writes one value per line. The order in which the values are written (input-to-hidden weights, then hidden biases, then hidden-to-output weights, then output biases) is arbitrary, so any system that uses the values from the weights file must use the same order.
Wrapping Up
If you’re new to neural networks, the number of decisions you have to make when using CNTK might seem a bit overwhelming. You need to decide how many hidden nodes to use, pick a hidden layer activation function, a learning optimization algorithm, a training error function, a training weight-initialization algorithm, a batch size, a learning rate and a maximum number of iterations.
However, in most cases, you can use the demo program pre- sented in this article as a template, and experiment mostly with the number of hidden nodes, the maximum number of iterations, and the learning rate. In other words, you can safely use tanh hidden layer activation, cross-entropy for training error, Glorot initializa- tion for weights and biases, and a training mini-batch size that is roughly 5 percent to 10 percent of the number of training items. The one exception to this is that instead of using the SGD training optimization algorithm, even though it’s the most commonly used, I suggest using the Adam algorithm.
Once you become familiar with CNTK basics, you can use the library to build very powerful, advanced, deep neural network archi- tectures such as convolutional neural networks (CNNs) for image recognition and long short-term memory recurrent neural networks (LSTM RNNs) for the analysis of natural language data. n
Dr. James mccaffrey works for Microsoft Research in Redmond, Wash. He has worked on several Microsoft products, including Internet Explorer and Bing. Dr. McCaffrey can be reached at jamccaff@microsoft.com.
Thanks to the following Microsoft technical experts who reviewed this article: Chris Lee and Sayan Pathak
28 msdn magazine
Machine Learning