MSDN Magazine, July 2018

Page 52 - MSDN Magazine, July 2018

P. 52

TesT Run JAMES MCCAFFREY
Introduction to DNN Image Classification Using CNTK
Image classification involves determining what category an input image belongs to, for example identifying a photograph as one con- taining “apples” or “oranges” or “bananas.” The two most common approaches for image classification are using a standard deep neural network (DNN) or using a convolutional neural network (CNN). In this article I’ll explain the DNN approach, using the CNTK library.
Take a look at Figure 1 to see where this article is headed. The demo program creates an image classification model for a subset of the Modified National Institute of Standards and Technology (MNIST) dataset. The demo training dataset consists of 1,000 images of handwritten digits. Each image is 28 high by 28 pixels wide (784 pixels) and represents a digit, 0 through 9.
The demo program creates a standard neural network with 784 input nodes (one for each pixel), two hidden processing layers (each with 400 nodes) and 10 output nodes (one for each possible digit). The model is trained using 10,000 iterations. The loss (also known as training error) slowly decreases and the prediction accuracy slowly increases, indicating training is working.
After training completes, the demo applies the trained model to a test dataset of 100 items. The model’s accuracy is 84.00 percent, so 84 of the 100 test images were correctly classified.
This article assumes you have intermediate or better programming skill with a C-family language, but doesn’t assume you know much about CNTK or neural networks. The demo is coded using Python, but even if you don’t know Python, you should be able to follow along without too much difficulty. The code for the demo program is pre- sented in its entirety in this article. The two data files used are available in the download that accompanies this article.
Understanding the Data
The full MNIST dataset consists of 60,000 images for train- ing and 10,000 images for testing. Somewhat unusually, the training set is contained in two files, one that holds all the pixel values and one that holds the associated label values (0 through 9). The test images are also contained in two files.
Additionally, the four source files are stored in a pro- prietary binary format. When working with deep neural networks, getting the data into a usable form is almost always time-consuming and difficult. Figure 2 shows the contents of the first training image. The key point is that
each image has 784 pixels, and each pixel is a value between 00h (0 decimal) and FFh (255 decimal).
Before writing the demo program, I wrote a utility program to read the binary source files and write a subset of their contents to text files that can be easily consumed by a CNTK reader object. File mnist_train_1000_cntk.txt looks like:
|digit 0 0 0 0 0 1 0 0 0 0 |pixels 0 .. 170 52 .. 0 |digit 0 1 0 0 0 0 0 0 0 0 |pixels 0 .. 254 66 .. 0 etc.
Getting the raw MNIST binary data into CNTK format isn’t trivial. The source code for my utility program can be found at: bit.ly/2ErcCbw. There are 1,000 lines of data and each represents one image. The tags “|digit” and “|pixels” indicate the start of the value-to-predict and the predictor values. The digit label is one-hot encoded where the position of the 1 bit indicates the digit. Therefore, in the pre- ceding code, the first two images represent a “5” and a “1.” Each line of data has 784 pixel values, each of which is between 0 and
Code download available at msdn.com/magazine/0718magcode.
46 msdn magazine
Figure 1 Image Classification Using a DNN with CNTK

50 51 52 53 54