Page 57 - MSDN Magazine, October 2019
P. 57
The demo code explicitly initializes the hidden node and out- put node weights using the Xavier Uniform (also known as Glorot Uniform) algorithm, and initializes the biases to zero. This is the default mechanism so explicit initialization could’ve been omitted. But in my opinion, it’s good practice to explicitly initialize because the default initialization scheme could change in the future.
The demo code specifies the hidden layer and output layer activation functions in the forward function:
def forward(self, x):
z = T.tanh(self.hid1(x))
z = T.tanh(self.hid2(z))
z = T.sigmoid(self.oupt(z)) return z
For relatively shallow neural networks, the tanh activation func- tion often works well for hidden layer nodes, but for deep neural networks, ReLU (rectified linear units) activation is generally pre- ferred. The output node has logistic sigmoid activation, which forces the output value to be between 0.0 and 1.0.
The demo program uses a program-defined class, Net, to define the layer architecture and the input-output mechanism. An alternative is to create the network by using the Sequential function, for example:
net = T.nn.Sequential( T.nn.Linear(4,8), T.nn.Tanh(), T.nn.Linear(8,8), T.nn.Tanh(), T.nn.Linear(8,1), T.nn.Sigmoid())
BecausePyTorchworksata relatively low level of abstraction, there are several different ways to implement each part of
a prediction system.
Because PyTorch works at a relatively low level of abstraction, there are several different ways to implement each part of a pre- diction system. This gives you a lot of flexibility, but increases the difficulty of trying to understand code examples.
Training the Model
Training the model/network is prepared with these eight statements:
net = net.train() # set training mode lrn_rate = 0.01
bat_size = 16
loss_func = T.nn.BCELoss()
optimizer = T.optim.SGD(net.parameters(), lr=lrn_rate) max_epochs = 100
n_items = len(train_x)
batcher = Batcher(n_items, bat_size)
The learning rate (0.01), batch size (16), and max epochs (100) must be determined by trial and error. For binary classification with a single logistic sigmoid output node, you can use either binary cross entropy or mean squared error loss, but not cross entropy (which is used for multiclass classification). The demo uses a program- defined class Batcher to serve up the indices of 16 training items at a time. An alternative approach is to use the built-in Dataset and DataLoader objects in the torch.utils.data module. msdnmagazine.com
An epoch is one complete pass through all training items. Because there are 1,097 training items and each batch is 16 items, there are 1097 / 16 = 68 weight and bias update operations per epoch. During training, the prediction accuracy of the model is computed and dis- played every 10 epochs using a program-defined function named akkuracy. The akkuracy function operates at the Tensor level using efficient aggregate operations. During the development of the demo, I used a function named accuracy that uses a less efficient approach.
Making a Prediction
After the model was trained, the demo used the model to make a prediction for a new, previously unseen banknote. First, the four pairs of min-max values for each predictor variable in the training data are placed into a matrix:
train_min_max = np.array(\[
\[-7.0421, 6.8248\],
\[-13.7731, 12.9516\],
\[-5.2861, 17.9274\],
\[-7.8719, 2.1625\]\], dtype=np.float32)
Recall that the first predictor variable is image variance. So, in the 1,097 training items, the smallest variance is -7.0421 and the largest variance is 6.8248.
The unknown banknote is set to arbitrary values (1.2345, 2.3456, 3.4567, 4.5678) and then min-max normalized, like so:
unknown_raw = np.array(\[\[1.2345, 2.3456, 3.4567, 4.5678\]\], dtype=np.float32)
unknown_norm = np.zeros(shape=(1,4), dtype=np.float32) for i in range(4):
x = unknown_raw\[0\]\[i\]
mn = train_min_max\[i\]\[0\] # min
mx = train_min_max\[i\]\[1\] # max unknown_norm\[i\] = (x - mn) / (mx - mn)
A PyTorch network expects two-dimensional input (though there are some exceptions), so the demo sets up input with one row and four columns. The prediction is made with these statements:
unknown = T.Tensor(unknown_norm) # to Tensor raw_out = net(unknown) # a Tensor pred_prob = raw_out.item() # scalar, \[0.0, 1.0\]
The network requires a Tensor object so the NumPy matrix is converted to a Tensor. A quirk of PyTorch is that if a Tensor has a single value, the value can be extracted using the Tensor.item method.
Wrapping Up
The field of neural machine learning is advancing with tremendous speed. Significant new algorithms and neural architectures are appearing every few months. At the time this article was written, three neural network code libraries appear to be distancing them- selves from the dozens of those available. PyTorch and TensorFlow are starting to be the most commonly used libraries where some customization or flexibility is needed. The Keras library is becoming the library of choice for situations where a relatively straightfor- ward neural network can be used. But it’s too early to predict which of these libraries (if any) will become de facto standards. n
Dr. James mccaffrey works for Microsoft Research in Redmond, Wash. He has worked on several key Microsoft products, including Azure and Bing. Dr. McCaffrey can be reached at jamccaff@microsoft.com.
Thanks to the following Microsoft technical experts for reviewing this article: Chris Lee, Ricky Loynd
October 2019 53