MSDN Magazine, April 2018

Page 67 - MSDN Magazine, April 2018

P. 67

In the body of the function, the first five statements are a one- to-one mapping to the five math equations in Figure 3. Notice that the ft, it and ot gates all use the MatSig function. Therefore, all three are vectors with values between 0.0 and 1.0. You can think of these as filters that are applied to input, output or state, where the value in the gate is the percentage retained. For example, if one of the values in ft is 0.75, then 75 percent of the corresponding value in the combined input and previous-output vector is retained. Or, equivalently, 25 percent of the information is forgotten.
The computation of the new cell state, ct, is simple to imple- ment but conceptually quite deep. At a high level, the new cell state depends on a gated combination of the input vector xt, and the previous output vector and cell state, h_prev and c_prev. The new output, ht, depends on the new cell state and the output gate. Quite remarkable.
The function returns the new output and new cell state in an array. This leads to a return type of float[][][], where result[0] is an array-of-arrays matrix holding the output and result[1] holds the new cell state.
Calling ComputeOutputs is mostly a matter of setting up the parameter values. The demo begins the preparation with:
float[][] xt = MatFromArray(new float[] { 1.0f, 2.0f }, 2, 1);
float[][] h_prev = MatFromArray(new float[] { 0.0f, 0.0f, 0.0f }, 3, 1);
float[][] c_prev = MatFromArray(new float[] { 0.0f, 0.0f, 0.0f }, 3, 1);
Both the previous output and the cell state are explicitly initial- ized to zero. Next, two sets of arbitrary weight values are created:
float[][] W = MatFromArray(new float[] { 0.01f, 0.02f,
0.03f, 0.04f,
0.05f, 0.06f }, 3, 2);
float[][] U = MatFromArray(new float[] { 0.07f, 0.08f, 0.09f,
0.10f, 0.11f, 0.12f,
0.13f, 0.14f, 0.15f }, 3, 3);
Notice that the two matrices have different shapes. The weight values are copied to the input parameters:
float[][] Wf = MatCopy(W); float[][] Wi = MatCopy(W); float[][] Wo = MatCopy(W); float[][] Wc = MatCopy(W); float[][] Uf = MatCopy(U); float[][] Ui = MatCopy(U); float[][] Uo = MatCopy(U); float[][] Uc = MatCopy(U);
Figure 5 Function ComputeOutputs
Because the weights don’t change, the demo could have assigned by reference instead of using MatCopy. The biases are set up using the same pattern:
float[][] b = MatFromArray(new float[] { 0.16f, 0.17f, 0.18f }, 3, 1);
float[][] bf = MatCopy(b); float[][] bi = MatCopy(b); float[][] bo = MatCopy(b); float[][] bc = MatCopy(b);
Function ComputeOutputs is called like this:
float[][] ht, ct;
float[][][] result;
result = ComputeOutputs(xt, h_prev, c_prev,
Wf, Wi, Wo, Wc, Uf, Ui, Uo, Uc,
bf, bi, bo, bc);
ht = result[0]; // Output
ct = result[1]; // New cell state
The whole point of an LSTM cell is to feed a sequence of input vectors, so the demo sets up and sends a second input vector:
h_prev = MatCopy(ht);
c_prev = MatCopy(ct);
xt = MatFromArray(new float[] {
3.0f, 4.0f }, 2, 1);
result = ComputeOutputs(xt, h_prev, c_prev,
Wf, Wi, Wo, Wc, Uf, Ui, Uo, Uc,
bf, bi, bo, bc); ht = result[0]; ct = result[1];
Note that the demo explicitly sends the previous output and state vectors to ComputeOutputs. An alternative is to feed just the new input vector, because the previous output and cell state are still stored in ht and ct.
Connecting the Dots
So, what’s the point? An LSTM cell can be used to construct an LSTM recurrent neural network—an LSTM cell with some addi- tional plumbing. These networks have been responsible for major advances in prediction systems that work with sequence data. For example, suppose you were asked to predict the next word in the sentence, “In 2017, the championship was won by __.” With just that information, you’d be hard pressed to make a prediction. But suppose your system had state, and remembered that part of a pre- vious sentence was, “The NBA has held a championship game since 1947.” You’d now be in a position to predict one of the 30 NBA teams.
There are dozens of variations of LSTM architectures. Addi- tionally, because LSTM cells are complex, there are dozens of implementation variations for every architecture. But if you under- stand the basic LSTM cell mechanism, you can easily understand the variations.
The demo program sets the LSTM weights and biases to arbi- trary values. The weights and biases for a real-world LSTM network would be determined by training the network. You’d obtain a set of training data with known input values and known, correct output values. Then you’d use an algorithm such as back-propagation to find values for the weights and biases that minimize error between computed outputs and correct outputs. n
Dr. James mccaffrey works for Microsoft Research in Redmond, Wash. He has worked on several Microsoft products, including Internet Explorer and Bing. Dr. McCaffrey can be reached at jamccaff@microsoft.com.
Thanks to the following Microsoft technical experts who reviewed this article: Ricky Loynd and Adith Swaminathan
static float[][][] ComputeOutputs(float[][] xt, float[][] h_prev, float[][] c_prev,
float[][] Wf, float[][] Wi, float[][] Wo, float[][] Wc, float[][] Uf, float[][] Ui, float[][] Uo, float[][] Uc, float[][] bf, float[][] bi, float[][] bo, float[][] bc)
{
float[][] ft = MatSig(MatSum(MatProd(Wf, xt),
MatProd(Uf, h_prev), bf));
float[][] it = MatSig(MatSum(MatProd(Wi, xt),
MatProd(Ui, h_prev), bi));
float[][] ot = MatSig(MatSum(MatProd(Wo, xt),
MatProd(Uo, h_prev), bo));
float[][] ct = MatSum(MatHada(ft, c_prev),
MatHada(it, MatTanh(MatSum(MatProd(Wc, xt), MatProd(Uc, h_prev), bc))));
float[][] ht = MatHada(ot, MatTanh(ct));
float[][][] result = new float[2][][]; result[0] = MatCopy(ht);
result[1] = MatCopy(ct);
return result;
}
msdnmagazine.com
April 2018 61

65 66 67 68 69