Page 40 - MSDN Magazine, February 2018
P. 40
to help prevent model overfitting. For example, to add dropout to the first hidden layer, you could modify the demo code like this:
h1 = C.layers.Dense(hidden_dim, activation=C.ops.tanh, name='hidLayer1')(X)
d1 = C.layers.Dropout(0.50, name='drop1')(h1)
h2 = C.layers.Dense(hidden_dim, activation=C.ops.tanh,
name='hidLayer2')(d1)
h3 = C.layers.Dense(hidden_dim, activation=C.ops.tanh,
name='hidLayer3')(h2)
oLayer = C.layers.Dense(output_dim, activation=None,
name='outLayer')(h3)
Many of my colleagues prefer to always use Sequential, even for deep neural networks that only have a few hidden layers. I prefer manual chaining, but this is just a matter of style.
Training the Network
After creating a neural network and model, the demo program creates a Learner object and a Trainer object:
print("Creating a Trainer \n")
tr_loss = C.cross_entropy_with_softmax(nnet, Y)
tr_clas = C.classification_error(nnet, Y)
learn_rate = 0.01
learner = C.sgd(nnet.parameters, learn_rate)
trainer = C.Trainer(nnet, (tr_loss, tr_clas), [learner])
You can think of a Learner as an algorithm and a Trainer as an object that uses the Learner algorithm. The tr_loss (“training loss”) object defines how to measure error between network-computed output values and known correct output values in the training data. For classification, cross entropy is almost always used, but CNTK supports several alternatives. The “with_softmax” part of the function name indicates that the function expects raw output node values rather than values normalized with softmax. This is why the output layer doesn’t use an activation function.
The tr_clas (“training classification error”) object defines how the number of correct and incorrect predictions are calculated during training. CNTK defines a classification error (percentage of incor- rect predictions) library function rather than a classification accuracy function used by some other libraries. So, there are two forms of error being calculated during training. The tr_loss error is used to adjust the weights and biases. The tr_clas is used to monitor prediction accuracy.
The Learner object uses the SGD algorithm with a constant learning rate set to 0.01. SGD is the simplest training algorithm but it’s rarely the best-performing one. CNTK supports a large number of learner algorithms, some of which are very complex. As a rule of thumb, I recommend starting with SGD and only trying more exotic algorithms if training fails. The Adam algorithm (Adam isn’t an acronym) is usually my second choice.
Notice the unusual syntax for creating a Trainer object. The two loss function objects are passed as a Python tuple, indicated by the parentheses, but the Learner object is passed as a Python list, indi- cated by square brackets. You can pass multiple Leaner objects to a Trainer, though the demo program passes just one.
The code that actually performs training is:
for i in range(0, max_iter):
curr_batch = rdr.next_minibatch(batch_size,
input_map=my_input_map) trainer.train_minibatch(curr_batch) if i % 1000 == 0:
mcee = trainer.previous_minibatch_loss_average
pmea = trainer.previous_minibatch_evaluation_average macc = (1.0 - pmea) * 100
print("batch %6d: mean loss = %0.4f, \
mean accuracy = %0.2f%% " % (i, mcee, macc))
It’s important to monitor training progress because training often fails. Here, the average cross-entropy error on the just-used batch of 10 training items is displayed every 1,000 iterations. The demo displays the average classification accuracy (percentage of correct predictions on the current 10 items), which I think is a more natural metric than classification error (percentage of incorrect predictions).
Saving the Trained Model
Because there are only 150 training items, the demo neural net- work can be trained in just a few seconds. But in non-demo scenarios, training a very deep neural network can take hours, days or even longer. After training, you’ll want to save your model so you won’t have to retrain from scratch. Saving and loading a trained CNTK model is very easy. To save, you can add code like this to the demo program:
mdl = ".\\Models\\seed_dnn.model" model.save(mdl, format=C.ModelFormat.CNTKv2)
The first argument passed to the save function is just a filename, possibly including a path. There’s no required file extension, but using “.model” makes sense. The format parameter has the default value ModelFormat.CNTKv2, so it could’ve been omitted. An alternative is to use the new Open Neural Network Exchange format=ONNX.
Recall that the demo program created both an nnet object (with no softmax on the output) and a model object (with softmax). You’ll normally want to save the softmax version of a trained model, but you can save the non-softmax object if you wish.
Once a model has been saved, you can load it into memory like so:
model = C.ops.functions.Function.Load(".\\Models\\seed_dnn.model")
And then the model can be used as if it had just been trained. Notice that there’s a bit of asymmetry in the calls to save and load— save is a method on a Function object and load is a static method from the Function class.
Wrapping Up
Many classification problems can be handled using a simple feed-forward neural network (FNN) with a single hidden layer. In theory, given certain assumptions, an FNN can handle any problem a deep neural network can handle. However, in practice, sometimes a deep neural network is easier to train than an FNN. The mathe- matical basis for these ideas is called the universal approximation theorem (or sometimes the Cybenko Theorem).
If you’re new to neural network classification, the number of decisions you have to make can seem intimidating. You must decide on the number of hidden layers, the number of nodes in each layer, an initialization scheme and activation function for each hidden layer, a training algorithm, and the training algorithm parameters such as learning rate and momentum term. However, with prac- tice you’ll quickly develop a set of rules of thumb for the types of problems with which you deal. n
Dr. James mccaffrey works for Microsoft Research in Redmond, Wash. He has worked on several Microsoft products, including Internet Explorer and Bing. Dr. McCaffrey can be reached at jamccaff@microsoft.com.
Thanks to the following Microsoft technical experts who reviewed this article: Chris Lee, Ricky Loynd, Kenneth Tran
36 msdn magazine
Machine Learning