Page 31 - MSDN Magazine, July 2017
P. 31
built-in CNTK reader functions, or you can use data in non-CTF format and write a custom reader function. The demo program uses the CTF data format approach. File trainData_cntk.txt looks like:
|attribs 5.1 3.5 1.4 0.2 |species 1 0 0 ...
|attribs 7.0 3.2 4.7 1.4 |species 0 1 0 ...
|attribs 6.9 3.1 5.4 2.1 |species 0 0 1
You specify the feature (predictor) values by using the “|” character followed by a string identifier, and the label values in the same way. You can use whatever you like for identifiers.
To create the training data, I go to the Wikipedia entry for Fisher’s Iris Data, copy and paste all 150 items into Notepad, select the first 40 of each species, and then do a bit of edit-replace. I use the leftover 10 of each species in the same way to create the testData_cntk.txt file. The create_reader function that uses the data files is defined as:
def create_reader(path, is_training, input_dim, output_dim): return MinibatchSource(CTFDeserializer(path, StreamDefs( features = StreamDef(field='attribs', shape=input_dim,
is_sparse=False),
labels = StreamDef(field='species', shape=output_dim,
is_sparse=False)
)), randomize = is_training,
max_sweeps = INFINITELY_REPEAT if is_training else 1)
You can think of this function as boilerplate for CTF files. The only thing you’ll need to edit is the string identifiers (“attribs” and “species” here) used to identify features and labels.
Creating a Neural Network
The definition of function do_demo begins with:
def do_demo(): input_dim = 4 hidden_dim = 2 output_dim = 3
train_file = "trainData_cntk.txt" test_file = "testData_cntk.txt"
input_Var = C.ops.input(input_dim, np.float32)
label_Var = C.ops.input(output_dim, np.float32) ...
The meanings and values of the first five variables should be clear to you. Variables input_Var and label_Var are created using the built-in function named input, located in the cntk.ops package. You can think of these variables as numeric matrices, plus some special properties needed by CNTK.
The neural network is created with these statements:
print("Creating a 4-2-3 tanh softmax NN for Iris data ") with default_options(init = glorot_uniform()):
hLayer = Dense(hidden_dim, activation=C.ops.tanh, name='hidLayer')(input_Var)
oLayer = Dense(output_dim, activation=C.ops.softmax, name='outLayer')(hLayer)
nnet = oLayer
The Dense function creates a fully connected layer of nodes. You pass in the number of nodes and an activation function. The name parameter is optional in general, but is needed if you want to extract the weights and biases associated with a layer. Notice that instead of passing an array of input values for a layer into the Dense function, you append an object holding those values to the function call.
When creating a neural network layer, you should specify how the values for the associated weights and biases are initialized, using the init parameter to the Dense function. The demo initializes weights msdnmagazine.com
and biases using the Glorot (also called Xavier initialization) mini- algorithm implemented in function glorot_uniform. There are several alternative initialization functions in the cntk.initializer module.
The statement nnet = oLayer creates an alias for the output layer named oLayer. The idea is that the output layer represents a single layer, but also the output of the entire neural network.
Training the Neural Network
After training and test data have been set up, and a neural network has been created, the next step is to train the network. The demo program creates a trainer with these statements:
print("Creating a cross entropy mini-batch Trainer \
") ce = C.cross_entropy_with_softmax(nnet, label_Var)
pe = C.classification_error(nnet, label_Var)
fixed_lr = 0.05
lr_per_batch = learning_rate_schedule(fixed_lr,
UnitType.minibatch)
learner = C.sgd(nnet.parameters, lr_per_batch) trainer = C.Trainer(nnet, (ce, pe), [learner])
The most common approach for measuring training error is to use what’s called cross-entropy error, also known as log loss. The main alternative to cross-entropy error for numeric problems similar to the Iris demo is the squared_error function.
After training has completed, you’re more interested in classi- fication accuracy than in cross-entropy error—you want to know how many correct predictions the model makes. The demo uses the built-in classification_error function.
There are several optimization algorithms that can be used to minimize error during training. The most basic is called stochastic gradient descent (SGD), which is often called back-propagation. Alternative algorithms supported by CNTK include SGD with momentum, Nesterov and Adam (adaptive moment estimation).
The mini-batch form of SGD reads in one subset of the training items at a time, calculates the calculus gradients, and then updates all weights and bias values by a small increment called the learning rate. Training is often highly sensitive to the values used for the learning rate. After a CNTK trainer object has been created, the demo prepares training with these statements:
max_iter = 5000 batch_size = 5 progress_freq = 1000
reader_train = create_reader(train_file, True, input_dim, output_dim)
my_input_map = {
input_Var : reader_train.streams.features, label_Var : reader_train.streams.labels
}
pp = ProgressPrinter(progress_freq)
The SGD algorithm is iterative, so you must specify a maximum number of iterations. Note that the value for the mini-batch size should be between 1 and the number of items in the training data.
The reader object for the trainer object is created by a call to create_reader. The True argument that’s passed to create_reader tells the function that the reader is going to be used for training data rather than test data and, therefore, that the data items should be processed in random order, which is important to avoid training stagnation.
The my_input_map object is a Python two-item collection. It’s used to tell the reader object where the feature data resides (input_ Var) and where the label data resides (label_Var). Although you
July 2017 27

