MSDN Magazine, May 2019

Page 56 - MSDN Magazine, May 2019

P. 56

TesT Run JAMES MCCAFFREY Weighted k-NN Classification Using C#
The goal of a machine learning (ML) classification problem is to predict a discrete value. For example, you might want to predict the political leaning (conservative, moderate, liberal) of a person based on their age, annual health care expenses, sex, years of educa- tion and so on. There are many different classification techniques, including neural networks, decision trees and logistic regression. In this article I show how to implement the weighted k-nearest neighbors algorithm, using the C# language.
When using k-NN, it’s important to normalize the data so that large values, such as an annual expense of $30,000, don’t overwhelm small values, such as an age of 25.
The best way to understand where this article is headed is to take a look at the demo run in Figure 1 and a graph of the associated data in Figure 2. The demo program sets up 33 dummy data items. Each item represents a person’s age, annual health care expenses and an arbitrary class to predict (0, 1, 2), which you can think of as political leaning, for concreteness. The data has only two predictor variables so it can be displayed in a graph, but k-NN works with any number of predictors.
The data has been normalized. When using k-NN, it’s important to normalize the data so that large values, such as an annual expense of $30,000, don’t overwhelm small values, such as an age of 25. The goal of the demo is to pre- dict the class of a new person who has normalized age and expenses of (0.35, 0.38). The demo sets k = 6 and finds the six labeled data items that are closest to the item-to-classify.
After identifying the six closest labeled data items, the demo uses a weighted voting technique to reach a decision.
The weighted voting values for classes (0, 1, 2) are (0.3879, 0.4911, 0.1209), so the item at (0.35, 0.38) would be classified as class 1.
This article assumes you have intermediate or better program- ming skills with C# or a C-family language such as Python or Java, but doesn’t assume you know anything about the weighted k-NN algorithm. The complete demo code and the associated data are pre- sented in this article. The source code and the data are also available in the accompanying download. All normal error checking has been removed to keep the main ideas as clear as possible.
Understanding the Weighted k-NN Algorithm
The weighted k-NN algorithm is one of the simplest of all ML techniques, but there are a few tricky implementation details. The first question most developers have is, “How do you determine the value of k?” The somewhat unsatisfying answer is that the value of k must be determined by trial and error.
When using k-NN, you must compute the distances from the item-to-classify to all the labeled data. There are many distance
Code download available at msdn.com/magazine/0519magcode.
50 msdn magazine
Figure 1 Weighted k-NN Classification Demo Run

54 55 56 57 58