Page 55 - MSDN Magazine, October 2019
P. 55
new to the Python ecosystem, you can think of a Python .whl file as somewhat similar to a Windows .msi file.) I opened a command shell, navigated to the directory holding the .whl file and entered the command:
pip install torch-1.0.0-cp36-cp36m-win_amd64.whl
Understanding the Data
The Banknote Authentication dataset has 1,372 items. The raw data looks like:
3.6216, 8.6661, -2.8073, -0.44699, 0 4.5459, 8.1674, -2.4586, -1.4621, 0 ...
-2.5419, -0.65804, 2.6842, 1.1952, 1
The first four values on each line are the predictor values. The last value on each line is either 0 (authentic) or 1 (forgery). The predic- tor values are from a digitized image of each banknote and include variance, skewness, kurtosis and entropy. All the predictors are numeric. If the data had a categorical predic- tor such as color, those values could’ve been converted to numeric values using either 1-of-(N-1) or one-hot encoding.
Banknote Data (Partial) - Kurtosis and Entropy Predictors
3.00
2.00
1.00
0.00
-8.0 -6.0 -4.0 -2.0 0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0 18.0 20.0
-1.00 -2.00 -3.00 -4.00 -5.00 -6.00 -7.00 -8.00 -9.00
Kurtosis
Authentic Forgery
Because there are four predictor variables, it isn’t possible to easily visualize the dataset, but you can get a rough idea of the data from the graph in Figure 2. The graph shows the kurtosis and entropy values for the first 100 of the 1,372 data items. Notice that simple linear prediction algorithms would likely perform poorly on this data because it isn’t linearly separable.
The first step to prepare the raw data is to randomly split the dataset into a training set and a test set. I split as 80 percent (1097 items) for training and the remaining 20 percent (275 items) for testing. Next, when using a neural network, it’s advisable to normalize numeric predictors so that values with large magnitudes don’t overwhelm small values. I used min-max normalization on the four predictor variables in the training set.
For each predictor column, I computed the min value and the max value, and then for every value x, normalized as (x - min) / (max - min). After min-max normalization, all values will be between 0.0 and 1.0, where 0.0 maps to the smallest value, and 1.0 maps to the largest value. I saved the min-max values for each column and then normalized the test data using those values. Note that you should normalize test data using the training set min-max values rather than normalize each dataset independently.
During normalization I replaced the comma separators used in the raw data by tab characters. I saved the training and test data in a subdirectory named Data. The demo program code that loads the two datasets into memory is:
train_file = “.\\\\Data\\\\banknote_norm_train.txt” test_file = “.\\\\Data\\\\banknote_norm_test.txt”
train_x = np.loadtxt(train_file, delimiter=’\\t’, usecols=\[0,1,2,3\], dtype=np.float32)
train_y = np.loadtxt(train_file, delimiter=’\\t’, usecols=\[4\], dtype=np.float32, ndmin=2)
test_x = np.loadtxt(test_file, delimiter=’\\t’, usecols=\[0,1,2,3\], dtype=np.float32)
test_y =np.loadtxt(test_file, delimiter=’\\t’, usecols=\[4\], dtype=np.float32, ndmin=2)
Notice that PyTorch wants the Y data (authentic or forgery) in a two-dimensional array, even when the data is one-dimensional (conceptually a vector of 0 and 1 values). The default data type for PyTorch neural networks is 32 bits because the precision gained by using 64 bits usually isn’t worth the memory and performance penalty incurred.
The Demo Program
The complete demo program, with a few minor edits to save space, is presented in Figure 3. I indent with two spaces rather than the usual four spaces to save space. Note that Python uses the “\\” char- acter for line continuation. I used Notepad to edit my program. Most of my colleagues prefer a more sophisticated editor, but I like the raw simplicity of Notepad.
The demo program starts by importing the NumPy and PyTorch packages and assigning shortcut aliases. An alternative to importing theentirePyTorchpackageistoimportjustthenecessarymodules, for example, import torch.optim as opt.
Defining the Neural Network Architecture
The demo defines a 4-(8-8)-1 neural network model with these statements:
class Net(T.nn.Module): def __init__(self):
super(Net, self).__init__()
self.hid1 = T.nn.Linear(4, 8) # 4-(8-8)-1
self.hid2 = T.nn.Linear(8, 8)
self.oupt = T.nn.Linear(8, 1) ...
The number of input nodes, four in this case, is determined by the data. For binary classification, by far the most common approach is to use a single output node where a value less than 0.5 maps to class zero (authentic) and a value greater than 0.5 maps to class one (forgery). The number of hidden layers (two in the demo) and the number of nodes in each hidden layer (eight in the demo) are hyperparameters that must be determined by trial and error.
msdnmagazine.com
October 2019 51
Figure 2 Partial Banknote Authentication Data
Entropy