Page 71 - MSDN Magazine, November 2019
P. 71

}
static void UpdateMeans(double[][] u, double[][] w, double[][] x, double[] Nk)
{
double[][] result = MatrixCreate(K, d); for (int k = 0; k < K; ++k) {
for (int i = 0; i < N; ++i) for (int j = 0; j < d; ++j)
result[k][j] += w[i][k] * x[i][j];
for (int j = 0; j < d; ++j) result[k][j] = result[k][j] / Nk[k];
}
for (int k = 0; k < K; ++k) for (int j = 0; j < d; ++j)
u[k][j] = result[k][j]; }
static void UpdateVariances(double[][] V, double[][] u, double[][] w, double[][] x, double[] Nk)
{
double[][] result = MatrixCreate(K, d); for (int k = 0; k < K; ++k) {
for (int i = 0; i < N; ++i) for (int j = 0; j < d; ++j)
result[k][j] += w[i][k] * (x[i][j] - u[k][j]) * (x[i][j] - u[k][j]);
for (int j = 0; j < d; ++j) result[k][j] = result[k][j] / Nk[k];
}
for (int k = 0; k < K; ++k) for (int j = 0; j < d; ++j)
V[k][j] = result[k][j]; }
static double ProbDenFunc(double x, double u, double v) {
// Univariate Gaussian
if (v == 0.0) throw new Exception("0 in ProbDenFun"); double left = 1.0 / Math.Sqrt(2.0 * Math.PI * v); double pwr = -1 * ((x - u) * (x - u)) / (2 * v); return left * Math.Exp(pwr);
}
static double NaiveProb(double[] x, double[] u, double[] v)
{
// Poor man's multivariate Gaussian PDF double sum = 0.0;
for (int j = 0; j < d; ++j)
sum += ProbDenFunc(x[j], u[j], v[j]); return sum / d;
}
static double[][] MatrixCreate(int rows, int cols, double v = 0.0)
{
double[][] result = new double[rows][]; for (int i = 0; i < rows; ++i)
result[i] = new double[cols]; for (int i = 0; i < rows; ++i)
for (int j = 0; j < cols; ++j) result[i][j] = v;
return result; }
static void MatrixShow(double[][] m, bool nl = false) {
for (int i = 0; i < m.Length; ++i) {
for (int j = 0; j < m[0].Length; ++j) {
Console.Write(m[i][j].ToString("F4") + " "); }
Console.WriteLine(""); }
if (nl == true) Console.WriteLine("");
}
static void VectorShow(double[] v, bool nl = false) {
for (int i = 0; i < v.Length; ++i) Console.Write(v[i].ToString("F4") + " ");
Console.WriteLine(""); if (nl == true)
Console.WriteLine(""); }
} // Program class } // ns
It’s been a tremendous pleasure writing articles about software development, testing and machine learning for MSDN Magazine over the past 17 years. Rather than look back,
I prefer to look forward. The essence of MSDN Magazine has always been filling the gap between low-level software documen- tation (too much detail) and high-level Hello World-style examples (not enough detail). The need to fill that gap won’t go away so I speculate that the soul of MSDN Magazine will quickly reemerge in an online format of some sort. So goodbye/hello for now and keep your eyes open for MSDN Magazine authors and editors providing useful information in a new format to the software developer community.
– James McCaffrey
Wrapping Up
The code and most of the math notation used in this article are based on a short paper titled “The EM Algorithm for Gaussian Mixtures.” The paper isn’t on a stable Web site and has no author given, but the subtitle and URL indicate the paper is part of lecture notesfromaUCIrvinecomputerscienceclass.Youcaneasilyfind
Aloha, Servus and Ciao
the paper as a .pdf file by doing an Internet search for the title. I used this somewhat obscure paper because it’s clear and accurate, and almost all of the other Gaussian mixture model resources I found had significant technical errors.
Data clustering is usually an exploratory process. Clustering results are typically examined visually to see if any interesting patterns appear. Most of my colleagues start a clustering investi- gation by looking at a graph of the source data. If the data appears evenly distributed, then using the k-means algorithm (or one of its many variations) usually works well. But if the data appears skewed, a mixture model clustering approach such as the one presented in this article often gives better results. Most clustering algorithms, including naive Gaussian mixture model, are extremely sensitive to initial values of the means and variances, so these values are often supplied via a preliminary k-means analysis. n
Dr. James mccaffrey works for Microsoft Research in Redmond, Wash. He has worked on several key Microsoft products including Azure and Bing. Dr. McCaffrey can be reached at jamccaff@microsoft.com.
Thanks to the following Microsoft technical experts who reviewed this article: Ricky Loynd, Kirk Olynyk
msdnmagazine.com
November 2019 59


































































































   69   70   71   72   73