Page 65 - MSDN Magazine, April 2018
P. 65
neural layer
× element-wise op duplicate
combine
x(t) current input h(t-1) previous output c(t-1) previous state h(t) new output
c(t) new state
f(t) forget gate
i(t) input gate
o(t) output gate
xt ht
ht–1
ct–1
Figure 2 LSTM Cell Architecture
The math equations in Figure 3 define the behavior of the demo program LSTM cell. If you don’t regularly work with math defini- tions, your reaction to Figure 3 is likely, again, "What the heck?" Equations (1), (2) and (3) define three gates: a forget gate, an input gate and an output gate. Each gate is a vector of values between 0.0 and 1.0, which are used to determine how much information to forget (or, equivalently, remember) in each input-output cycle. Equation (4) computes the new cell state, and equation (5) com- putes the new output.
These equations are simpler than they appear. For example, equation (1) is implemented by the demo code as:
float[][] ft = MatSig(MatSum(MatProd(Wf, xt), MatProd(Uf, h_prev), bf));
Here, MatSig is a program-defined function that applies logistic- sigmoid to each value in a matrix. MatSum adds three matrices. MatProd multiplies two matrices. Once you understand basic matrix and vector operations, implementing an LSTM cell is quite easy.
Overall Demo Program Structure
The structure of the demo program, with a few minor edits to save space, is presented in Figure 4. The demo uses a static method approach rather than an OOP approach to keep the main ideas as clear as possible.
To create the demo, I launched Visual Studio and created a new C# console application named LSTM_IO. I used Visual Studio 2015, but the demo has no significant .NET Framework dependencies, so any version of Visual Studio will work fine.
After the template code loaded, in the Solution Explorer window I renamed file Program.cs to LSTM_IO_Program.cs and allowed Visual Studio to automatically rename class Program for me. At the top of the editor window, I deleted all unneeded references to namespaces, leaving just the one to the top-level System name- space. All the work is performed by function ComputeOutputs.
Matrices Using C#
In order to implement an LSTM cell, you must have a solid grasp of working with C# matrices. In C#, a matrix is an array-of-arrays. msdnmagazine.com
In machine learning, it’s common to use 32-bit type float rather than 64-bit byte double.
The demo defines a helper to create a matrix as:
static float[][] MatCreate(int rows, int cols) {
float[][] result = new float[rows][]; for (int i = 0; i < rows; ++i) result[i] = new float[cols];
return result; }
The first statement creates an array with the specified number of rows, where each row is an array of type float. The for-loop statement allocates each row as an array with the specified number of columns. Note that unlike most programming languages, C# supports a true matrix type, but using an array-of-array approach is much more common.
In machine learning, the term column vector, or just vector for short, refers to a matrix with one column. Most machine leaning code works with vectors rather than one-dimensional arrays. The demo defines a function to generate a matrix/vector from a one-dimensional array:
static float[][] MatFromArray(float[] arr, int rows, int cols) {
float[][] result = MatCreate(rows, cols); int k = 0;
for (int i = 0; i < rows; ++i)
for (int j = 0; j < cols; ++j) result[i][j] = arr[k++];
return result; }
The function can be called to create a 3x1 (3 rows, 1 column) vector like this:
σ
σ
τ
σ
ftit ot ×
×+
×
τ ct
float[][] v = MatFromArray(new float[] {1.0f, 9.0f, 5.0f}, 3,1);
(1) ft=
(2) it =
(3) ot =
(4) ct=
(5) ht=
σ(Wƒ xt + Uƒ ht–1 + bƒ)
σ(Wixt + Uiht–1 + bi)
σ(Woxt + Uoht–1 + bo)
ft ct–1 + it τ(Wc xt + Uc ht–1 + bc) otτ(ct)
n : size of input
m : size of cell state and output
xt : input vector, time t, size n×1
ft : forget gate vector, size m×1
it : input gate vector, size m×1
ot : output gate vector, size m×1
ht : output vector, size m×1
ct : cell state vector, size m×1
Wf, Wi, Wo, Wc : input gate weight matrices, size m×n Uf, Ui, Uo, Uc : output gate weight matrices, size m×m bf, bi, bo, bc : bias vectors, size m×1
σ : logistic sigmoid activation function
τ : tanh activation function
Figure 3 LSTM Cell Math Equations
April 2018 59