Page 26 - MSDN Magazine, July 2017
P. 26
MACHINE LEARNING
Introduction to the
Microsoft CNTK v2.0
Library
James McCaffrey
The Microsoft Cognitive Toolkit (CNTK) is a powerful, open source library that can be used to create machine learning prediction models. In particular, CNTK can create deep neural networks, which are at the forefront of artificial intelligence efforts such as Cortana and self-driving automobiles.
CNTK version 2.0 is much, much different from version 1. At the time I’m writing this article, version 2.0 is in Release Candidate mode. By the time you read this, there will likely be some minor changes to the code base, but I’m confident they won’t affect the demo code presented here very much.
In this article, I’ll explain how to install CNTK v2.0, and how to create, train and make predictions with a simple neural network. A good way to see where this article is headed is to take a look at the screenshot in Figure 1.
The CNTK library is written in C++ for performance reasons, but v2.0 has a new Python language API, which is now the pre- ferred way to use the library. I invoke the iris_demo.py program by typing the following in an ordinary Windows 10 command shell:
> python iris_demo.py 2>nul
The second argument suppresses error messages. I do this only to avoid displaying the approximately 12 lines of CNTK build information that would otherwise be shown.
The goal of the demo program is to create a neural network that can predict the species of an iris flower, using the well-known Iris Data Set. The raw data items look like this:
5.0 3.5 1.3 0.3 setosa 5.5 2.6 4.4 1.2 versicolor 6.7 3.1 5.6 2.4 virginica
There are 150 data items, 50 of each of three species: setosa, versi- color and virginica. The first four values on each line are the predictor values, often called attributes or features. The item-to-predict is often called the class or the label. The first two feature values are a flower’s sepal length and width (a sepal is a leaf-like structure). The next two values are the petal length and width.
Neural networks work only with numeric values, so the data files used by the demo encode species as setosa = (1,0,0), versicolor = (0,1,0) and virginica = (0,0,1).
The demo program creates a 4-2-3 neural network; that is, a network with four input nodes for the feature values, two hidden processing nodes and three output nodes for the label values. The number of input and output nodes for a neural network classifier are determined by the structure of your data, but the number of hidden processing nodes is a free parameter and must be deter- mined by trial and error.
You can think of a neural network as a complex mathematical prediction equation. Neural network training is the process of
Disclaimer: CNTK version 2.0 is in Release Candidate mode. All information is subject to change.
This article discusses:
• Installing CNTK v2.0
• Understanding neural networks
• The structure of the demo program
• Creating, training and testing a neural network • Measuring error and accuracy
• Making predictions
Technologies discussed:
Microsoft Cognitive Toolkit, Python, NumPy
Code download available at:
msdn.com/magazine/0717magcode
22 msdn magazine