MSDN Magazine, July 2018

Page 24 - MSDN Magazine, July 2018

P. 24

To summarize, an ML model is all the information needed to accept input data and generate an output prediction. In the case of a neural network, this information consists of the number of input, hidden and output nodes, the values of the weights and biases, and the types of activation functions used on the hidden and output layer nodes.
OK, but where do the values of the weights and the biases come from? They’re determined by training the model. Training is using a set of data that has known input values and known, correct output values, and applying an optimization algorithm such as back-propagation to minimize the difference between computed output values and known, correct output values.
There are many other kinds of ML models, such as decision trees and naive Bayes, but the general principles are the same. When using a neural network code library such as Microsoft CNTK or Google Keras/TensorFlow, the program that trains an ML model will save the model to disk. For example, CNTK and Keras code resembles:
mp = ".\\Models\\iris_nn.model"
model.save(mp, format=C.ModelFormat.CNTKv2) # CNTK
model.save(".\\Models\\iris_model.h5") # Keras
ML libraries also have functions to load a saved model. For example:
mp = ".\\Models\\iris_nn.model"
model = C.ops.functions.Function.load(mp) # CNTK
model = load_model(".\\Models\\iris_model.h5") # Keras
Most neural network libraries have a way to save just a model’s weights and biases values to file (as opposed to the entire model).
Deploying a Standard ML Model to an IoT Device
The image in Figure 1 shows an example of what training an ML model looks like. I used Visual Studio Code as the editor and the Python language API interface to the CNTK v2.4 library. Creating a trained ML model can take days or weeks of effort, and typically requires a lot of processing power and memory. There- fore, model training is usually performed on powerful machines,
often with one or more GPUs. Additionally, as the size and com- plexity of a neural network increases, the number of weights and biases increases dramatically, and so the file size of a saved model also increases greatly.
For example, the 4-5-3 iris model described in the previous sectionhasonly(4*5)+5+(5*3)+3=43weightsandbiases. But an image classification model with millions of input pixel values and hundreds of hidden processing nodes can have hun- dreds of millions, or even billions, of weights and biases. Notice that the values of all 43 weights and biases of the iris example are shown in Figure 1:
input layer
6.1
3.1
-0.5503 5.1
-0.3220
0.2680 0.3954
hidden layer
0.1164
0.1817
0.7552
-0.0824
-0.1190
-0.9287
-0.9081
-0.7297 -0.6733
output layer
-0.0466
-1.7654 0.0321
1.2352 0.6458
0.5396 0.3221
So, suppose you have a trained ML model. You want to deploy the model to a small, weak, IoT device. The simplest solution is to install onto the IoT device the same neural network library software you used to train the model. Then you can copy the saved trained model file to the IoT device and write code to load the model and make a prediction. Easy!
Unfortunately, this approach will work only in relatively rare situ- ations where your IoT device is quite powerful—perhaps along the lines of a desktop PC or laptop. Also, neural network libraries such as CNTK and Keras/TensorFlow were designed to train models quickly and efficiently, but in general they were not necessarily designed for optimal performance when performing input-output with a trained model. In short, the easy solution for deploying a trained ML model to an IoT device on the edge is rarely feasible.
The Custom Code Solution
Based on my experience and conversations with colleagues, the most common way to deploy a trained ML model to an IoT device on the edge is to write custom C/C++ code on the device. The idea is that C/C++ is almost universally available on IoT devices, and C/C++ is typically fast and compact. The demo pro- gram in Figure 3 illustrates the concept.
The demo program starts by using the gcc C/C++ tool to compile file test.c into an executable on the target device. Here, the target device is just my desktop PC but there are C/C++ compilers for almost every kind of IoT/CPU device. When run, the demo pro- gram displays the values of the weights and biases of the iris flower example, then uses input values of (6.1, 3.1, 5.1, 1.1) and computes and displays the output values (0.0321, 0.6458, 0.3221). If you com- pare Figure 3 with Figures 1 and 2, you’ll see the inputs, weights and biases, and outputs are the same (subject to rounding error).
Demo program test.c implements only the neural network input-output process. The program starts by setting up a struct data structure to hold the number of nodes in each layer, values for the hidden and output layer nodes, and values of the weights and biases:
Machine Learning
1.1
0.9367
0.9381
Figure 2 The Neural Network Input-Output Mechanism 18 msdn magazine
[[ 0.2680 -0.3782 [ 0.3954 -0.4367 [-0.5503 0.6453 [-0.322 0.4035
[ 0.1164 -0.1567
[[ 0.7552 -0.0001 [-0.7297 -0.2048 [-0.6733 -0.2512 [ 0.9367 -0.4276 [ 0.9381 -0.3728
[-0.0466 0.4528
-0.3828 0.1143 0.1269] -0.4332 0.3880 0.3814] 0.6394 -0.6454 -0.6300]
0.4163 -0.3074 -0.3112]]
-0.1604 0.0810 0.0822]
-0.7706] 0.9301] 0.9167]
-0.5134] -0.5667]]
-0.4062]

22 23 24 25 26