Page 28 - MSDN Magazine, July 2017
P. 28
input layer
hidden layer
output layer
0.1859
-0.0625 0.263
0.6735
0.8912 0.682
-0.8595
-1.6246 0.055
Computing the output node values is similar to the process used to compute hidden nodes, but a different activation function, called softmax, is used. The first step is to compute the sum of products plus bias for all three output nodes:
6.9
3.1
4.6
1.3
0.6100 0.9274
0.7152 0.9519
-1.0855 -0.5259 -1.0687
-0.7244
0.1468
0.1882 -0.7311 -4.1944
0.0360
3.2200
pre-output[0]
pre-output[1]
pre-output[2]
= (0.1882)(3.2200) + (0.9999)(-0.8545) + 0.1859 = -0.0625
= (0.1882)(-0.7311) + (0.9999)(0.3553) + 0.6735 = 0.8912
= (0.1882)(-4.1944) + (0.9999)(0.0244) + (-0.8595) = -1.6246
0.9999
-0.8545 0.3553
0.0244
The softmax value of one of a set of three values is the exp function applied to the value, divided by the sum of the exp function applied to all three values. So the final output node values are computed as:
output[0] = exp(-0.0625) / exp(-0.0625) + exp(0.8912) + exp(-1.6246) = 0.263
output[1] = exp(0.8912) / exp(-0.0625) + exp(0.8912) + exp(-1.6246) = 0.682
output[2] = exp(-1.6246) / exp(-0.0625) + exp(0.8912) + exp(-1.6246) = 0.055
The purpose of softmax is to coerce the preliminary output values so they sum to 1.0 and can be interpreted as probabilities.
OK, but where do the values of the weights and biases come from? Togetthevaluesoftheweightsandbiases,youmusttrainthenetwork using a set of data that has known input values and known, correct, output values. The idea is to use an optimization algorithm that finds the values for the weights and biases that minimizes the difference between the computed output values and the correct output values.
Demo Program Structure
The overall structure of the demo program is shown in Figure 3. The demo program has a function named main that acts as an entry point. The main function sets the seed of the global random number generator to 0 so that results will be reproducible, and then
calls function do_demo that does all the work.
Helper function my_print displays a numeric vector using a
specified number of decimals. The point here is that CNTK is just a library, and you must mix program-defined Python code with calls to the various CNTK functions. Helper function create_reader returns a special CNTK object that can be used to read data from a data file that uses the special CTF (CNTK text format) formatting protocol.
Helper function save_weights accepts a filename, a matrix of input-to-hidden weights, an array of hidden node biases, a matrix of hidden-to-output weights, and an array of output node biases, and writes those values to a text file so they can be used by other systems.
The complete listing for the demo program, with a few minor edits, is presented in Figure 4. I use an indent of two-space char- acters instead of the more common four, to save space. Also, all normal error-checking code has been removed.
The demo program begins by importing the required Python packages and modules. I’ll describe the modules as they’re used in the demo code.
Setting Up the Data
There are two basic ways to read data for use by CNTK functions. YoucanformatyourfilesusingthespecialCTFformatandthenuse
Figure 2 Neural Network Input-Output Mechanism
flower of an unknown species. The eight arrows connecting each of the four input nodes to the two hidden processing nodes repre- sent numeric constants called weights. If nodes are 0-base indexed with[0]atthetop,thentheinput-to-hiddenweightfrominput[0] to hidden[0] is 0.6100 and so on.
Similarly, the six arrows connecting the two hidden nodes to the three output nodes are hidden-to-output weights. The two small arrows pointing into the two hidden nodes are special weights called biases. Similarly, the three output nodes each have a bias value.
The first step in the neural network input-output mechanism is to compute the values of the hidden nodes. The value in each hidden node is the hyperbolic tangent of the sum of products of
input values and associated weights, plus the bias. For example:
hidden[0] = tanh( (6.9)(0.6100) + (3.1)(0.7152) +
(4.6)(-1.0855) + (1.3)(-1.0687) + 0.1468 )
= tanh(0.1903) = 0.1882
The value of the hidden[1] node is calculated in the same way. The hyperbolic tangent function, abbreviated tanh, is called the hidden layer activation function. The tanh function accepts any value, from negative infinity to positive infinity, and returns a value between -1.0 and +1.0. There are several choices of activation functions supported by CNTK. The three most common are tanh, logistic sigmoid and rectified linear unit (ReLU).
Figure 3 Demo Program Structure
# iris_demo.py
import cntk as C
...
def my_print(arr, dec):
def create_reader(path, is_training, input_dim,
output_dim):
def save_weights(fn, ihWeights, hBiases,
hoWeights, oBiases): def do_demo():
def main():
print("\
Begin Iris demo (CNTK 2.0) \
") np.random.seed(0)
do_demo() # all the work is done in do_demo()
if __name__ == "__main__": main()
24 msdn magazine
Machine Learning

