Page 54 - MSDN Magazine, June 2019
P. 54
TesT Run JAMES MCCAFFREY
Simplified Naive Bayes Classification Using C#
The goal of a naive Bayes classification problem is to predict a discrete value. For example, you might want to predict the authenticity of a gemstone based on its color, size and shape (0 = fake, 1 = authentic). In this article I show how to implement a simplified naive Bayes classification algorithm using the C# language.
The best way to understand where this article is headed is to take a look at the demo run in Figure 1. The demo program sets up 40 dummy data items. Each item has three predictor values: color (Aqua, Blue, Cyan, Dune), size (Small, Large), and shape (Pointed, Rounded, Twisted), followed by a binary class to predict (0 or 1).
The demo sets up an item to predict: (Cyan, Small, Twisted). Naive Bayes classification is based on probabilities, which in turn are based on counts in the data. The demo scans the 40-item dataset and computes and displays six joint counts: items that have both Cyan and 0 (3 items), Cyan and 1 (5), Small and 0 (18), Small and 1 (15), Twisted and 0 (5), Twisted and 1 (3). The demo also counts the number of class 0 items (24) and class 1 items (16).
In my opinion, naive Bayes classification is best explained using a concrete example.
Using the count information, the demo calculates interme- diate values called evidence terms (0.0082, 0.0131) and then uses the evidence terms to calculate pseudo-probabilities (0.3855, 0.6145) that correspond to predictions of class 0 and class 1. Because the second pseudo-probability is larger, the conclusion is that (Cyan, Small, Twisted) is class 1.
This article assumes you have intermediate or better programming skill with C# or a C-family language such as Python or Java, but doesn’t assume you know anything about naive Bayes classification. The complete demo code and the associated data are presented in this article. The source code and the data are also available in the accom- panying download. All normal error checking has been removed to keep the main ideas as clear as possible.
Understanding Naive Bayes Classification
Naive Bayes classification is both simple and complicated. Imple- mentation is relatively simple, but the underlying math ideas are very complex. There are many different math equations that define naive Bayes classification. Three examples are shown in Figure 2. Letter P means probability; Ck is one of k classes (0, 1, 2, . . .); the pipe symbol is read “given that”; and X is a vector of input values such as (Cyan, Small, Twisted). Unfortunately, these equations don’t provide much help when implementing naive Bayes classification until after you understand the technique.
Code download available at msdn.com/magazine/0619magcode.
50 msdn magazine
Figure 1 Simplified Naive Bayes Classification Demo Run