Page 56 - MSDN Magazine, July 2018
P. 56
second hidden layer. Then, the outputs of the second hidden layer are sent to the output layer. The two hidden layers use ReLU (rectified linear units) activation, which, for image classification, often works better than standard tanh activation.
Notice that there’s no activation applied to the output nodes. This is a quirk of CNTK because the CNTK training function expects raw, un-activated values. The dnn object is just a conve- nience alias. The model object has softmax activation so it can be used after training to make predictions. Because Python assigns by reference, training the dnn object also trains the model object.
Training the Neural Network
The neural network is prepared for training with:
tr_loss = C.cross_entropy_with_softmax(dnn, Y) tr_eror = C.classification_error(dnn, Y) max_iter = 10000
batch_size = 50
learn_rate = 0.01
learner = C.sgd(dnn.parameters, learn_rate)
trainer = C.Trainer(dnn, (tr_loss, tr_eror), [learner])
The training loss ( tr_loss) object tells CNTK how to measure error when training. The cross-entropy error is usually the best
Figure 3 Complete Demo Program Listing
choice for classification problems. The training classification error (tr_eror) object can be used to automatically compute the percentage of incorrect predictions during training or after training. Specifying a loss function is required, but specifying a classification error function is optional.
The values for the maximum number of training iterations, the number of items in a batch to train at a time, and the learning rate are all free parameters that must be determined by trial and error. Youcanthinkofthelearnerobjectasanalgorithm,andthetrainer object as the object that uses the learner to find good values for the neural network’s weights and biases values. The stochastic gradient descent (sgd) learner is the most primitive algorithm but works well for simple problems. Alternatives include adaptive moment estimation (adam) and root mean square propagation (rmsprop).
A reader object for the training data is created with these statements:
rdr = create_reader(train_file, input_dim, output_dim, rnd_order=True, m_swps=C.io.INFINITELY_REPEAT)
mnist_input_map = {
X : rdr.streams.x_src, Y : rdr.streams.y_src
}
# mnist_dnn.py
# MNIST using a 2-hidden layer DNN (not a CNN) # Anaconda 4.1.1 (Python 3.5.2), CNTK 2.4
import numpy as np import cntk as C
def create_reader(path, input_dim, output_dim, rnd_order, m_swps): x_strm = C.io.StreamDef(field='pixels', shape=input_dim,
is_sparse=False)
y_strm = C.io.StreamDef(field='digit', shape=output_dim,
is_sparse=False)
streams = C.io.StreamDefs(x_src=x_strm, y_src=y_strm) deserial = C.io.CTFDeserializer(path, streams)
mb_src = C.io.MinibatchSource(deserial, randomize=rnd_order,
max_sweeps=m_swps) return mb_src
# ===================================================================
def main():
print("\nBegin MNIST classification using a DNN \n")
train_file = ".\\Data\\mnist_train_1000_cntk.txt" test_file = ".\\Data\\mnist_test_100_cntk.txt"
C.cntk_py.set_fixed_random_seed(1) input_dim = 784 # 28 x 28 pixels hidden_dim = 400
output_dim = 10 # 0 to 9
X = C.ops.input_variable(input_dim, dtype=np.float32)
Y = C.ops.input_variable(output_dim) # float32 is default
print("Creating a 784-(400-400)-10 ReLU classifier") with C.layers.default_options(init=\
C.initializer.uniform(scale=0.01)):
h_layer1 = C.layers.Dense(hidden_dim, activation=C.ops.relu,
name='hidLayer1')(X/255)
h_layer2 = C.layers.Dense(hidden_dim, activation=C.ops.relu,
name='hidLayer2')(h_layer1)
o_layer = C.layers.Dense(output_dim, activation=None,
name='outLayer')(h_layer2)
dnn = o_layer # train this
model = C.ops.softmax(dnn) # use for prediction
tr_loss = C.cross_entropy_with_softmax(dnn, Y) tr_eror = C.classification_error(dnn, Y)
max_iter = 10000 # num batches, not epochs batch_size = 50
learn_rate = 0.01
learner = C.sgd(dnn.parameters, learn_rate)
trainer = C.Trainer(dnn, (tr_loss, tr_eror), [learner])
# 3. create reader for train data
rdr = create_reader(train_file, input_dim, output_dim,
rnd_order=True, m_swps=C.io.INFINITELY_REPEAT) mnist_input_map = {
X : rdr.streams.x_src,
Y : rdr.streams.y_src }
# 4. train
print("\nStarting training \n") for i in range(0, max_iter):
curr_batch = rdr.next_minibatch(batch_size, \ input_map=mnist_input_map)
trainer.train_minibatch(curr_batch) if i % int(max_iter/10) == 0:
mcee = trainer.previous_minibatch_loss_average
macc = (1.0 - trainer.previous_minibatch_evaluation_average) \
* 100
print("batch %4d: mean loss = %0.4f, accuracy = %0.2f%% " \
% (i, mcee, macc)) print("\nTraining complete \n")
# 5. evaluate model on test data
rdr = create_reader(test_file, input_dim, output_dim,
rnd_order=False, m_swps=1) mnist_input_map = {
X : rdr.streams.x_src,
Y : rdr.streams.y_src }
num_test = 100
test_mb = rdr.next_minibatch(num_test, input_map=mnist_input_map) test_acc = (1.0 - trainer.test_minibatch(test_mb)) * 100 print("Model accuracy on the %d test items = %0.2f%%" \
% (num_test,test_acc))
print("\nEnd MNIST classification using a DNN \n")
if __name__ == "__main__": main()
50 msdn magazine
Test Run