Page 47 - MSDN Magazine, March 2019
P. 47
easy
multicolor
hit-
highlighting
®
Understanding the SVM Mechanism
If you refer to Figure 1, you’ll see the trained SVM has three sup- port vectors: (4, 5, 7), (7, 4, 2) and (9, 7, 5). And the model has three weight values = (-0.000098, -0.000162, 0.000260) and bias = -2.506. The decision value for input (3, 5, 7) is computed by calculating the value of the kernel function with each of the three support vectors, then multiplying each kernel value by its corresponding weight, summing, then adding the bias:
x=(3,5,7) sv1=(4,5,7) sv2=(7,4,2) sv3=(9,7,5)
K(x, sv1) * wt1 = 7396.0 * -0.000098 = -0.725 K(x, sv2) * wt2 = 3025.0 * -0.000162 = -0.490 K(x, sv3) * wt3 = 9409.0 * 0.000260 = 2.446
decision = -0.725 + -0.490 + 2.446 + -2.506 = -1.274
prediction = Sign(decision) = -1
Notice that if the predictor values are not normalized, as in the
demo, the values of the kernels can become very large, forcing the values of the weights to become very small, which could possibly lead to arithmetic errors.
The SVM mechanism points out strengths and weaknesses of the technique. SVM focuses only on the key support vectors, and therefore tends to be resilient to bad training data. When the num- ber of support vectors is small, an SVM is somewhat interpretable, an advantage compared to many other techniques. Compared to many other classification techniques, notably neural networks, SVMs can often work well with limited training data, but SVMs can have trouble dealing with very large training datasets. The major disadvantages of SVMs is that SVMs are very complex and they require you to specify the value of many hyperparameters.
Wrapping Up
As this article shows, implementing a support vector machine is quite complex and difficult. Because of this, there are very few SVM library implementations available. Most SVM libraries are based on a C++ implementation called LibSVM, which was created by a group of researchers. Because calling C++ is often difficult, there are several libraries that provide wrappers over LibSVM, written in Python, Java, C# and other languages.
By experimenting with the code presented in this article, you’ll gain a good understanding of exactly how SVMs work and be able to use a library implementation more effectively. Because the code in this article is self-contained and simplified, you’ll be able to explore alternative kernel functions and their parameters, and the SMO training algorithm parameters epsilon, tolerance, and complexity. n
Dr. James mccaffrey works for Microsoft Research in Redmond, Wash. He has worked on several key Microsoft products including Internet Explorer and Bing. Dr. McCaffrey can be reached at jamccaff@microsoft.com.
Thanks to the following Microsoft technical experts who reviewed this article: Yihe Dong, Chris Lee
Instantly Search Terabytes
dtSearch’s document filters support: • popular file types
• emails with multilevel attachments • a wide variety of databases
• web data
Over 25 search options including:
• efficient multithreaded search
•
• forensics options like credit card search
• SDKs for Windows, UWP, Linux, Mac, iOS in beta, Android in beta
• FAQs on faceted search, granular data classification, Azure and more
Visit dtSearch.com for
• hundreds of reviews and case studies
• fully-functional enterprise and developer evaluations
The Smart Choice for Text Retrieval® since 1991
dtSearch.com 1-800-IT-FINDS
Developers:
• APIs for C++, Java and .NET, including cross-platform .NET Standard with Xamarin and .NET Core
msdnmagazine.com