Page 35 - MSDN Magazine, February 2018
P. 35
Figure 1 Wheat Seed Variety Prediction Demo
The demo program concludes by making a prediction for an unknown wheat seed. The seven input values are (17.6, 15.9, 0.8, 6.2, 3.5, 4.1, 6.1). The computed raw output node values are (1.0530, 2.5276, -3.6578) and the associated output node probability values are (0.1859, 0.8124, 0.0017). Because the middle value is largest, the out- put maps to (0, 1, 0) which is variety Rosa.
recommend the Anaconda distribution) which contains the core Python language and required Python packages, and then you install CNTK as an additional Python package. In other words, CNTK is not a standalone install.
At the time of this writing, the current version of CNTK is v2.3. Because CNTK is under vigorous development, by the time you read this, there could well be a newer version. I used the Anaconda distribution version 4.1.1 (which con- tains Python version 3.5.2, NumPy version 1.11.1, and SciPy version 0.17.1). After installing Anaconda, I installed the CPU-only version of CNTK using the pip utility program. Installing CNTK can be a bit tricky if you’re careless with versioning compatibility, but the CNTK documentation describes the installation process in detail.
Understanding the Data
Creating most machine learning systems starts with the time-consuming and often annoying process of setting up the training and test data files. The raw wheat seeds data set can be found at bit.ly/2idhoRK. The raw 210-item tab-delimited data looks like this:
14.11 14.1 0.8911 5.42 3.302 2.7 5 1 16.63 15.46 0.8747 6.053 3.465 2.04 5.877 1
I wrote a utility program to generate a file in a format that can be easily handled by CNTK. The resulting 210-item file looks like:
|properties 14.1100 14.1000 ... 5.0000 |variety 1 0 0 |properties 16.6300 15.4600 ... 5.8770 |variety 1 0 0
The utility program added a leading "|properties" tag to identify the location of the features, and a "|variety" tag to identify the location of the class to predict. The raw class values were 1-of-N encoded (sometimes called one-hot encoding), tabs were replaced by single blank space charac-
ters, and all predictor values were formatted to exactly four decimals. In most situations you’ll want to normalize numeric predictor values so they all have roughly the same range. I didn’t normalize this data, in order to keep this article a bit simpler. Two common forms of normalization are z-score normalization and min-max
This article assumes you have interme- diate or better programming skills with a C-family language, and a basic familiar- ity with neural networks. But regardless of your background, you should be able to follow along without too much trouble. The complete source code for the seeds_dnn.py program is presented in this article. The code, and the associated training and test data files, are also available in the file download that accompanies this article.
Installing CNTK v2
Because CNTK v2 is relatively new, you may not be familiar with the installa- tion process. Briefly, you first install a Python language distribution (I strongly msdnmagazine.com
input
17.6
15.9
0.8
6.2
3.5
4.1
6.1
hidden layers
output
1.5262 0.2888
2.4201 0.7060
-2.4804 0.0053
nnet model
February 2018 31
Figure 2 Deep Neural Network Structure