Page 59 - MSDN Magazine, June 2018
P. 59
A loss (error) function is required so the training object knows how to adjust weights and biases to reduce error. CNTK 2.4 has nine loss functions, but the simple squared_error is almost always suitable for a regression problem. The number of iterations corresponds to the number of update operations and must be determined by trial and error.
The Trainer object requires a Learner object. You can think of a Learner as an algorithm. CNTK supports eight learning algo- rithms. For regression problems, I typically get good results using either basic stochastic gradient descent or the more sophisticated Adam (“adaptive momentum estimation”).
The batch size is used by CNTK to determine how often to per- form weight and bias updates. The demo sets the batch size to 11. Therefore, the 308 items will be grouped into 308 / 11 = 28 ran- domly selected batches. Each batch is analyzed and then updates are performed. The learning rate controls the magnitude of the weight and bias adjustments. Determining good values for the batch size, the maximum number of iterations, and the learning rate are often the biggest challenges when creating a neural net- work prediction model.
The demo calls the program-defined create_reader function to, well, create a reader object. And an input_map is created that tells the reader where the feature values are and where the value-to-predict is:
rdr = create_reader(train_file, input_dim, output_dim, rnd_order=True, sweeps=C.io.INFINITELY_REPEAT)
hydro_input_map = {
X : rdr.streams.x_src, Y : rdr.streams.y_src
}
The rnd_order parameter ensures that the data items will be processed differently on each pass, which is important to prevent training from stalling out. The INFINITELY_REPEAT argument allows training over multiple passes through the 308-item data set.
After preparation, the model is trained like so:
for i in range(0, max_iter):
curr_batch = rdr.next_minibatch(batch_size,
input_map=hydro_input_map) trainer.train_minibatch(curr_batch) if i % int(max_iter/10) == 0:
mcee = trainer.previous_minibatch_loss_average print("batch %6d: mean squared error = %8.4f" % \
(i, mcee))
The next_minibatch function pulls 11 items from the data. The train function uses the Adam algorithm to update weights and biases based on squared error between computed hull resistance values and actual resistance values. The squared error on the cur- rent 11-item batch is displayed every 50,000 / 10 = 5,000 batches so you can visually monitor training progress: You want to see loss/ error values that generally decrease.
Using the Model
After the model has been trained, the demo program makes some predictions. First, the predictor values for two arbitrary items from the normalized data set are selected (items 99 and 238) and placed into an array-of-arrays style matrix:
inpts = np.array(
[[0.520000, 0.785714, 0.550000, 0.405512,
0.648352, 0.000000],
[1.000000, 1.000000, 0.550000, 0.562992,
0.461538, 1.000000]], dtype=np.float32)
Next, the corresponding normalized actual hull resistance values are placed into an array:
actuals = np.array([0.003044, 0.825028], dtype=np.float32)
Then, the predictor values are used to compute the predicted values using the model.eval function, and predicted and actual values are displayed:
for i in range(len(inpts)):
print("\nInput: ", inpts[i])
pred = model.eval(inpts[i])
print("predicted resistance: %0.4f" % pred[0][0]) print("actual resistance: %0.4f" % actuals[i])
print("End yacht hull regression ")
When creating a neural network regression model, there’s no predefined accuracy metric.
Notice that the predicted hull resistance value is returned as an array-of-arrays matrix with a single value. Therefore, the value itself is at [0][0] (row 0, column 0). Dealing with shapes of CNTK vec- tors and matrices is a significant syntax challenge. When working with CNTK I spend a lot of time printing objects and displaying their shape, along the lines of print(something.shape).
Wrapping Up
When creating a neural network regression model, there’s no predefined accuracy metric. If you want to compute prediction accuracy you must define what it means for a predicted value to be close enough to the corresponding actual value in order to be considered correct. Typically, you’d specify a percentage/propor- tion, such as 0.10, and evaluate a predicted value as correct if it’s within that percentage of the actual value.
Because the demo model works with normalized data, if you use the model to make a prediction for new, previously unseen predictor values, you have to normalize them using the same min-max values that were used on the training data. Similarly, a predicted hull resistance value, pv, is normalized, so you’d have to de-normalize by computing pv * (max - min) + min.
The term “regression” can have several different meanings. In this article the term refers to a problem scenario where the goal is to predict a single numeric value (hull resistance). The classical statistics linear regression technique is much simpler than neural network regression, but usually much less accurate. The machine learning logistic regression technique predicts a single numeric value between 0.0 and 1.0, which is interpreted as a probability and then used to predict a categorical value such as “male” (p < 0.5) or “female” (p > 0.5). n
Dr. James mccaffrey works for Microsoft Research in Redmond, Wash. He has worked on several Microsoft products, including Internet Explorer and Bing. Dr. McCaffrey can be reached at jamccaff@microsoft.com.
Thanks to the following Microsoft technical experts who reviewed this article: Chris Lee, Ricky Loynd and Ken Tran
msdnmagazine.com
June 2018 53