Page 64 - MSDN Magazine, February 2018
P. 64

p-values between 0.0 and 1.0, but the average of the p-values will be 3 / (3+1) = 0.75. This means that p-values between 0.90 and 1.00 will be the most common; p-values between 0.80 and 0.90 will be a bit less common; and so on, down to very few p-values between 0.00 and 0.10. The graph in Figure 2 shows the results of 10,000 samples from the beta distribution with a = 3 and b = 1.
The demo uses three machines, but Thompson sampling can work with any number of machines.
The connection between the beta distribution and a Bernoulli bandit problem is quite subtle. Briefly, the beta distribution is the conjugate prior for the Bernoulli likelihood function. Although the underlying math is very deep, implementation of the Thompson algorithm is, fortunately, (relatively) simple.
The Demo Program
To create the demo program I launched Visual Studio and created a new console application named Thompson. The demo has no significant .NET Framework dependencies, so any version of Visual Studio is fine. After the template code loaded into the editor window, I right-clicked on file Program.cs and renamed it to the slightly more descriptive ThompsonProgram.cs, and allowed Visual
Figure 3 Thompson Sampling Demo Program Structure
Studio to automatically rename class Program for me. At the top of the template code, I deleted all references to unneeded namespaces, leaving just the reference to the top-level System namespace.
The overall program structure, with a few minor edits to save space, is presented in Figure 3. All the control logic is in the Main method. Sampling from the beta distribution is implemented using the program-defined BetaSampler class. All normal error checking is removed to save space.
The demo begins by setting up four arrays:
Console.WriteLine("Begin Thompson sampling demo"); int N = 3;
double[] means = new double[] { 0.3, 0.5, 0.7 }; double[] probs = new double[N];
int[] S = new int[N]; int[] F = new int[N];
The demo uses three machines, but Thompson sampling can work with any number of machines. The mean probabilities of payouts are hardcoded. The closer the mean probabilities are to each other, the more difficult the problem. The array named probs holds the probabilities from a sampling of the beta distribution, which determine which machine to play. The arrays named S (“success”) and F (“failure”) hold the cumulative number of times each machine paid out and didn’t pay out when played.
The demo uses two random number-generating objects:
Random rnd = new Random(4); BetaSampler bs = new BetaSampler(2);
The rnd object is used to determine whether a selected machine wins or loses. Note that I use the terms win and lose, pay out and not pay out, and success and failure, interchangeably. The bs object is
using System; namespace Thompson {
class ThompsonProgram {
static void Main(string[] args) {
Console.WriteLine("Begin Thompson sampling demo");
int N = 3;
double[] means = new double[] { 0.3, 0.5, 0.7 }; double[] probs = new double[N];
int[] S = new int[N];
int[] F = new int[N];
Random rnd = new Random(4); BetaSampler bs = new BetaSampler(2);
for (int trial = 0; trial < 10; ++trial) {
Console.WriteLine("Trial " + trial); for (int i = 0; i < N; ++i)
probs[i] = bs.Sample(S[i] + 1.0, F[i] + 1.0);
Console.Write("sampling probs = "); for (int i= 0; i < N; ++i)
Console.Write(probs[i].ToString("F4") + " "); Console.WriteLine("");
int machine = 0;
double highProb = 0.0;
for (int i = 0; i < N; ++i) {
if (probs[i] > highProb) { highProb = probs[i]; machine = i;
} }
Console.Write("Playing machine " + machine); double p = rnd.NextDouble();
if (p < means[machine]) {
Console.WriteLine(" -- win");
++S[machine]; }
else {
Console.WriteLine(" -- lose"); ++F[machine];
} }
Console.WriteLine("Final estimates of means: "); for (int i = 0; i < N; ++i) {
double u = (S[i] * 1.0) / (S[i] + F[i]);
Console.WriteLine(u.ToString("F4") + " "); }
Console.WriteLine("Number times machine played:"); for (int i = 0; i < N; ++i) {
int ct = S[i] + F[i];
Console.WriteLine(ct + " "); }
Console.WriteLine("End demo ");
Console.ReadLine(); }
}
public class BetaSampler {
// ... }
}
60 msdn magazine
Test Run


































































































   62   63   64   65   66