MSDN Magazine, March 2019

Page 46 - MSDN Magazine, March 2019

P. 46

kernel function so if you want to experiment with a different func- tion, you’ll have to implement it yourself.
A common alternative to the polynomial kernel is the radial basis function (RBF) kernel. In words, you sum the squared dif- ferences between elements, multiply by negative gamma, then raise to e (Euler’s number, approximately 2.718). In code the RBF kernel could look like:
public double RbfKernel(double[] v1, double[] v2) {
the demo. Epsilon is used internally to determine when to stop iter- ating, which in turn affects the number of support vectors found. Tolerance is used when computing error. The values of epsilon and tolerance are free parameters and the effect of changing them varies quite a bit (from a very small to a very large effect) depending on the particular problem at which you’re looking.
The code for method Train is presented in Figure 4. The method is iterative and returns the number of iterations that were performed. The method accepts a maxIter parameter to set a hard limit on the number of iterations performed. In theory, the SMO algorithm will always converge and stop iterating, but theory doesn’t always match practice with an algorithm as complex as SMO.
In addition to explicitly returning the number of iterations performed, Train finds and stores the indices of the training data items that are support vectors. After the number of support vec- tors is known, the array that holds the values of the weights can be allocated. The weight values are alpha values that are non-zero.
The Train method has many possible customization points. For example, the demo code stores support vectors as a List<double[]> collection. An alternative is to store just the indices of the sup- port vectors, in a List<int> collection or in an int[] array object. Examining the Train method carefully is the best way to start to understand SVMs and the SMO algorithm.
Figure 4 The Training Method
double sum = 0.0;
for (int i = 0; i < v1.Length; ++i)
sum += (v1[i] - v2[i]) * (v1[i] - v2[i]); return Math.Exp(-this.gamma * sum);
Notice that different kernel functions have different parameters. The linear and RBF kernels only require gamma, but the polyno- mial kernel requires gamma, degree and constant. The choice of kernel function to use, and the values of the parameters that apply to the kernel function being used, are free parameters and must be determined by trial and error. All machine learning classification techniques have hyperparameters, but SVMs tend to be particu- larly sensitive to their hyperparameter values.
The Sequential Minimal Optimization Algorithm
There are many algorithms that can be used to determine the sup- port vectors for an SVM problem. The SMO algorithm is the most common. The demo program follows the original explanation of SMO given in the 1998 research paper, “Sequential Minimal Opti- mization: A Fast Algorithm for Training Support Vector Machines,” which can be found in many places on the Internet.
The SMO algorithm is very complex and a full explanation would require roughly 200 pages (I know because I once reviewed an entire book dedicated just to SMO). SMO has three key functions: a top-level Train function that calls a helper ExamineExample function, which calls a helper TakeStep function.
The signature of TakeStep is: private bool TakeStep(int i1, int i2, double[][] X_matrix, int[] y_vector). Parameter X_matrix holds the training data predictor values. Parameter y_vector holds the training data target values, which are either -1 or +1. The i1 and i2 parameters are a first index and a second index pointing into the training data. On each call to TakeStep, the algorithm attempts to find a better solution and returns true if an improvement is found using the two training data items, false otherwise.
The signature of ExamineExample is: private int Examine- Example(nt i2, double[][] X_matrix, int[] y_vector). The function returns the number of changes that occurred so that TakeStep can determine if progress has been made.
Both TakeStep and ExamineExample use a class-scope vari- able named complexity. Larger values of complexity increasingly penalize outlier training data and force the SMO algorithm to try to find a solution that deals with them, at the expense of model overfitting. Because complexity is a parameter used by SMO, it will always be present, unlike parameters associated with the kernel function used, which might be present or absent.
The TakeStep function uses a class-scope variable named epsilon, and the ExamineExample function uses a class-scope variable named tolerance. Both are small values, set to 0.001 by default in
}
public int Train(double[][] X_matrix, int[] y_vector, int maxIter) {
int N = X_matrix.Length; this.alpha = new double[N]; this.errors = new double[N]; int numChanged = 0;
bool examineAll = true; int iter = 0;
while (iter < maxIter && numChanged > 0 || examineAll == true) { ++iter;
numChanged = 0;
if (examineAll == true) {
for (int i = 0; i < N; ++i)
numChanged += ExamineExample(i, X_matrix, y_vector);
} else {
for (int i = 0; i < N; ++i)
if (this.alpha[i] != 0 && this.alpha[i] !=
this.complexity)
numChanged += ExamineExample(i, X_matrix, y_vector); }
if (examineAll == true) examineAll = false;
else if (numChanged == 0) examineAll = true;
} // While
List<int> indices = new List<int>(); // support vectors for (int i = 0; i < N; ++i) {
// Only store vectors with Lagrange multipliers > 0
if (this.alpha[i] > 0) indices.Add(i); }
int num_supp_vectors = indices.Count; this.weights = new double[num_supp_vectors]; for (int i = 0; i < num_supp_vectors; ++i) {
int j = indices[i]; this.supportVectors.Add(X_matrix[j]); this.weights[i] = this.alpha[j] * y_vector[j];
}
this.bias = -1 * this.bias; return iter;
}
40 msdn magazine
Machine Learning

44 45 46 47 48