MSDN Magazine, March 2019

Page 60 - MSDN Magazine, March 2019

P. 60

The Sequential approach is much simpler, but notice you don’t have direct control over the weight and bias initialization algo- rithms. The tremendous flexibility you get when using PyTorch is an advantage once you become familiar with the library.
Training the Model
Training the model begins with these seven statements:
net = net.train() # Set training mode
bat_size = 10
loss_func = T.nn.MSELoss() # Mean squared error optimizer = T.optim.SGD(net.parameters(), lr=0.01) n_items = len(train_x)
batches_per_epoch = n_items // bat_size max_batches = 1000 * batches_per_epoch
PyTorch has two modes: train and eval. The default mode is train, but in my opinion it’s a good practice to explicitly set the mode. The batch (often called mini-batch) size is a hyperparameter. For a regression problem, mean squared error is the most common loss function. The stochastic gradient descent (SGD) algorithm is the most rudimentary technique and in many situations the Adam algorithm gives better results.
The demo program uses a simple approach for batching training items. For the demo, there are about 400 training items, so if the batch size is 10, on average visiting each training item once (this is usually called an epoch in machine learning terminology) will require 400 / 10 = 40 batches. Therefore, to train the equivalent of 1,000 epochs, the demo program needs 1000 * 40 = 40,000 batches.
The core training statements are:
for b in range(max_batches):
curr_bat = np.random.choice(n_items, bat_size,
replace=False)
X = T.Tensor(train_x[curr_bat])
Y = T.Tensor(train_y[curr_bat]).view(bat_size,1) optimizer.zero_grad()
oupt = net(X)
loss_obj = loss_func(oupt, Y) loss_obj.backward() # Compute gradients optimizer.step() # Update weights and biases
The choice function selects 10 random indices from the 404 available training items. The items are converted from NumPy arrays to PyTorch tensors. You can think of a tensor as a multi- dimensional array that can be efficiently processed by a GPU (even though the demo doesn’t take advantage of a GPU). The oddly named view function reshapes the one-dimensional target values into a two-dimensional tensor. Converting NumPy arrays to PyTorch tensors, and dealing with array and tensor shapes is a major challenge when working with PyTorch.
Once every 4,000 batches the demo program displays the value of the mean squared error loss for the current batch of 10 training items, and the prediction accuracy of the model, using the cur-
rent weights and biases on the entire 404-item training dataset:
if b % (max_batches // 10) == 0:
print("batch = %6d" % b, end="")
print(" batch loss = %7.4f" % loss_obj.item(), end="") net = net.eval()
acc = accuracy(net, train_x, train_y, 0.15)
net = net.train()
print(" accuracy = %0.2f%%" % acc)
The “//” operator is integer division in Python. Before calling the program-defined accuracy function, the demo sets the network into eval mode. Technically, this isn’t necessary because train and eval modes only give different results if the network uses dropout or layer batch normalization.
Evaluating and Using the Trained Model
After training completes, the demo program evaluates the predic- tion accuracy of the model on the test datasets:
net = net.eval() # set eval mode
acc = accuracy(net, test_x, test_y, 0.15) print("Accuracy on test data = %0.2f%%" % acc)
The eval function returns a reference to the model on which it’s applied; it could have been called without the assignment statement. In most situations, after training a model you want to save the model for later use. Saving a trained PyTorch model is a bit outside the scope of this article, but you can find several examples in the
PyTorch documentation.
PyTorch has two modes: train and eval. The default mode is train, but in my opinion it’s
a good practice to explicitly set the mode.
The whole point of training a regression model is to use it to make a prediction. The demo program makes a prediction using the first data item from the 102 test items:
raw_inpt = np.array([[0.09266, 34, 6.09, 0, 0.433, 6.495, 18.4, 5.4917, 7, 329, 16.1, 383.61, 8.67]], dtype=np.float32)
norm_inpt = np.array([[0.000970, 0.340000, 0.198148, -1, 0.098765, 0.562177, 0.159629, 0.396666, 0.260870, 0.270992, 0.372340, 0.966488, 0.191501]], dtype=np.float32)
When you have new data, you must remember to normalize the predictor values in the same way that the training data was nor- malized. For min-max normalization, that means you need to save the min and max value for every variable that was normalized.
The demo concludes by making and displaying the prediction:
...
X = T.Tensor(norm_inpt)
y = net(X)
print("Predicted = $%0.2f" % (y.item()*10000))
if __name__=="__main__": main()
The predicted value is returned as a tensor with a single value. The item function is used to access the value so it can be displayed.
Wrapping Up
The PyTorch library is somewhat less mature than alternatives TensorFlow, Keras and CNTK, especially with regard to example code. But among my colleagues, the use of PyTorch is growing very quickly. I expect this trend to continue and high-quality examples will become increasingly available to you. n
Dr. James mccaffrey works for Microsoft Research in Redmond, Wash. He has worked on several key Microsoft products including Internet Explorer and Bing. Dr. McCaffrey can be reached at jamccaff@microsoft.com.
Thanks to the following Microsoft technical experts who reviewed this article: Chris Lee, Ricky Loynd
54 msdn magazine
Test Run

58 59 60 61 62