Page 55 - MSDN Magazine, June 2019
P. 55

In my opinion, naive Bayes classification is best explained using a concrete example. The first step is to scan through the source data and compute joint counts. If there are nx predictor variables (three in the demo) and nc classes (two in the demo), then there are nx * nc joint counts to compute. Notice that the counting process means that predictor data must be discrete rather than numeric.
After calculating joint counts, 1 is added to each count. This is called Laplacian smoothing and is done to prevent any joint count from being 0, which would zero out the final results. For the demo data the smoothed joint counts are:
The denominator sum is called the evi- dence and is used to normalize the evidence terms so that they sum to 1.0 and can be loosely interpreted as probabilities. Note that if you’re just interested in prediction, you can simply use the largest evidence term and skip the evidence normalization step.
The Demo Program
The complete demo program, with a few minor edits to save space, is presented in Figure 3. To create the program, I launched Visual Studio and created a new console application named NaiveBayes. I used Visual Studio 2017, but the demo has no significant .NET Framework dependencies
so any version of Visual Studio will work fine.
After the template code loaded, in the editor window I removed
all unneeded namespace references and added a reference to the System.IO namespace. In the Solution Explorer window, I right- clicked on file Program.cs, renamed it to the more descriptive NaiveBayesProgram.cs, and allowed Visual Studio to automati- cally rename class Program.
After building the project, I used Notepad to create the 40-item dummy data file with the contents shown in Figure 4, and saved it as BayesData.txt in the project root directory.
Loading Data into Memory
The demo uses a program-defined method named LoadData to read the data file into memory as an array-of-arrays style matrix of type string. Method LoadData assumes the class values are the last value on each line of data. An alternative design is to read the predictor values into a string matrix and read the 0-1 class values into an array of type integer.
Note that if you’re just interested in prediction, you can simply use the largest evidence term and skip the evidence normalization step.
Method LoadData calls a helper method named MatrixString, which creates an array-of-arrays string matrix with the specified number of rows (40) and columns (4). An alternative design is to programmatically compute the number of rows and columns, but in my opinion the hardcoded approach is simpler and better.
Program Logic
All of the program control logic is contained in the Main method. The joint counts are stored in an array-of-arrays integer matrix
cyan and 0: cyan and 1: small and 0: small and 1: twisted and 0: twisted and 1:
2 + 1 = 3
4 + 1 = 5 17 + 1 = 18 14 + 1 = 15 4 + 1 = 5 2 + 1 = 3
Figure 2 Three Forms of Naive Bayes Classification Math
Naive Bayes classification is both simple and complicated. Implementation is relatively simple, but the underlying math ideas are very complex.
Next, the raw counts of class 0 items and class 1 items are calculated. Because these counts will always be greater than zero, no smoothing factor is needed. For the demo data, the class counts are:
0: 24 1: 16
Next, an evidence term for each class is calculated. For class 0, the evidence term is:
Z(0) = (3 / 24+3) * (18 / 24+3) * (5 / 24+3) * (24 / 40) = 3/27 * 18/27 * 5/27 * 24/40
= 0.1111 * 0.6667 * 0.1852 * 0.6 = 0.0082
The first three terms of calculation for Z(0) use the smoothed joint counts for class 0, divided by the class count for 0 (24) plus the number of predictor variables (nx = 3) to compensate for the three additions of 1 due to the Laplacian smoothing. The fourth term is P(class 0). The calculation of the class 1 evidence term follows the same pattern:
Z(1) = (5 / 16+3) * (15 / 16+3) * (3 / 16+3) * (16 / 40) = 5/19 * 15/19 * 3/19 * 16/40
= 0.2632 * 0.7895 * 0.1579 * 0.4
= 0.0131
The last step is to compute pseudo-probabilities:
P(class 0) = Z(0) / (Z(0) + Z(1))
= 0.0082 / (0.0082 + 0.0131)
= 0.3855
P(class 1) = Z(1) / (Z(0) + Z(1))
= 0.0131 / (0.0082 + 0.0131)
= 0.6145
msdnmagazine.com
June 2019 51


































































































   53   54   55   56   57