Page 64 - MSDN Magazine, April 2018
P. 64

TesT Run JAMES MCCAFFREY Understanding LSTM Cells Using C#
A long short-term memory (LSTM) cell is a small software com- ponent that can be used to create a recurrent neural network that can make predictions relating to sequences of data. LSTM networks have been responsible for major breakthroughs in several areas of machine learning. In this article, I demonstrate how to implement an LSTM cell using C#. Although it’s unlikely you’ll ever need to create a recurrent neural network from scratch, understanding exactly how LSTM cells work will help you if you ever need to create an LSTM network using a code library such as Microsoft CNTK or Google TensorFlow.
The screenshot in Figure 1 shows where this article is headed. The demo program creates an LSTM cell that accepts an input vector of size n = 2, and generates an explicit output vector of size m = 3 and a cell state vector of size m = 3. A key characteristic of LSTM cells is that they maintain a state.
AnLSTMcellhas(4*n*m)+(4*m*m)weightsand (4 * m) biases. Weights and biases are just constants, with values like 0.1234, that define the behavior of the LSTM cell. The demo has 60 weights and 12 biases that are set to arbitrary values.
The demo sends input (1.0, 2.0) to the LSTM cell. The computed output is (0.0629, 0.0878, 0.1143) and the new cell state is (0.1143, 0.1554, 0.1973). I’ll explain what these numeric values might represent shortly, but for now the point is that an LSTM cell accepts input, produces output and updates its state. The values of the weights and biases haven’t changed.
Next, the demo feeds (3.0, 4.0) to the LSTM cell. The new computed output is (0.1282, 0.2066, 0.2883). The new cell state is (0.2278, 0.3523, 0.4789). Although it’s not apparent, the new cell state value contains information about all previous input and output values. This is the “long” part of long short-term memory.
This article assumes you have intermediate or better pro- gramming skills, but doesn’t assume you know anything about LSTM cells. The demo is coded using C#, but you should be able to refactor the code to another language, such as Python or Visual Basic, if you wish. The demo code is just a bit too long to present in its entirety, but the com- plete code is available in the accompanying file download.
Understanding LSTM Cells
There are three ways to describe LSTM cells: using an architecture diagram as shown in Figure 2; using math equations as shown in Figure 3; and using code as presented in this article. Your initial impression of the architecture diagram in Figure 2 is probably something along the lines of, "What the heck?" The key ideas are that an LSTM cell accepts an input vector x(t), where the t stands for a time. The explicit output is h(t). The unusual use of h (rather than o) to represent output is historical and comes from the fact that neural systems were often described as functions g and h.
The LSTM cell also uses h(t-1) and c(t-1) as inputs. Here, the t-1 means from the previous time step. The c represents the cell state, so h(t-1) and c(t-1) are the previous output values and the previ- ous state values. The interior of the LSTM cell architecture looks complicated. And it is, but not as complicated as you might guess.
Code download available at msdn.com/magazine/0418magcode.
58 msdn magazine
Figure 1 LSTM Cell Input-Output Demo


































































































   62   63   64   65   66