# Tensorflow RNN API 源码阅读

## 先回顾一下 RNN/LSTM/GRU:

basic rnn：

$$h_t = \sigma(W_{hh}h_{t-1}+W_{hx}x_{|t|})$$

## 再看 Core RNN Cells

• tf.contrib.rnn.BasicRNNCell

• tf.contrib.rnn.BasicLSTMCell

• tf.contrib.rnn.GRUCell

• tf.contrib.rnn.LSTMCell

• tf.contrib.rnn.LayerNormBasicLSTMCell

### tf.contrib.rnn.BasicRNNCell

True

128 128

(32, 128) (32, 128)


### tf.contrib.rnn.BasicLSTMCell

$$f_t=\sigma(W^{f}x_t + U^{f}h_{t-1})$$

$$i_t=\sigma(W^{i}x_t + U^{i}h_{t-1})$$

$$o_t=\sigma(W^{o}x_t + U^{o}h_{t-1})$$

new memory cell:

$$\hat c=tanh(W^cx_t + U^ch_{t-1})$$

$$c_t=f_t\circ c_{t-1} + i_t\circ\hat c$$

$$h_t = tanh(o_t\circ c_t)$$

• 先 concat[input, h], 然后 gate_input = matmul(concat[input, h], self._kernel)+self._bias,多了偏置项,这里的矩阵维度 [input_depth + h_depth,4*num_units]. 然后 i,j,f,o = split(gate_input, 4, axis=1). 其中 j 表示 new memory cell. 然后计算 new_c，其中 i,f,o 对应的激活函数确定是 sigmoid,因为其范围只能在(0,1)之间。但是 j 的激活函数self._activation 可以选择，默认是 tanh.

• 与公式的差别之二在于 self._forget_bias.遗忘门在激活函数 $\sigma$ 之前加了偏置，目的是避免在训练初期丢失太多信息。

• 要注意 state 的形式，取决于参数 self._state_is_tuple. 其中 c,h=state，表示 $c_{t-1},h_{t-1}$

WARNING:tensorflow:From <ipython-input-9-3f4ca183c5d7>:1: BasicLSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.

Instructions for updating:

This class is deprecated, please use tf.nn.rnn_cell.LSTMCell, which supports all the feature this cell currently has. Please replace the existing code with tf.nn.rnn_cell.LSTMCell(name='basic_lstm_cell').


(128, LSTMStateTuple(c=128, h=128))


(TensorShape([Dimension(30), Dimension(128)]),

TensorShape([Dimension(30), Dimension(128)]),

TensorShape([Dimension(30), Dimension(128)]))

(<tf.Tensor: id=108, shape=(30, 128), dtype=float32, numpy=

array([[ 0.08166471,  0.14020835,  0.07970127, ..., -0.1540019 ,

0.38848224, -0.0842322 ],

[-0.03643086, -0.20558938,  0.1503458 , ...,  0.01846285,

0.15610473,  0.04408235],

[-0.0933667 ,  0.03454542, -0.09073547, ..., -0.12701994,

-0.34669587,  0.09373946],

...,

[-0.00752909,  0.22412673, -0.270195  , ...,  0.09341058,

-0.20986181, -0.18622127],

[ 0.18778914,  0.37687936, -0.24727295, ..., -0.06409463,

0.00218048,  0.5940756 ],

[ 0.04073388, -0.08431841,  0.35944715, ...,  0.14135318,

0.08472287, -0.11058106]], dtype=float32)>,

<tf.Tensor: id=111, shape=(30, 128), dtype=float32, numpy=

array([[ 0.04490132,  0.07412361,  0.03662094, ..., -0.07611651,

0.17290959, -0.0277745 ],

[-0.02212535, -0.13554382,  0.08272093, ...,  0.00918258,

0.0861209 ,  0.02614526],

[-0.05723168,  0.01372226, -0.02919216, ..., -0.06374882,

-0.1918035 ,  0.03912015],

...,

[-0.00377504,  0.15181372, -0.14555399, ...,  0.06073361,

-0.09804281, -0.07492835],

[ 0.10244624,  0.17440473, -0.09896267, ..., -0.03794969,

0.00123257,  0.21985768],

[ 0.01832823, -0.03795732,  0.1654894 , ...,  0.05827027,

0.02769112, -0.05957894]], dtype=float32)>)


### tf.nn.rnn_cell.GRUCell

$$r_t=\sigma(W^rx_t + U^rh_{t-1})$$

$$z_t=\sigma(W^zx_t + U^zh_{t-1})$$

$$\tilde h_t = tanh(Wx_t + r_t\circ h_{t-1})$$

$$h_t=(1-z_t)\circ\tilde h_t + z_t\circ h_{t-1}$$

(128, 128)

(TensorShape([Dimension(30), Dimension(128)]),

TensorShape([Dimension(30), Dimension(128)]))


### tf.nn.rnn_cell.LSTMCell, tf.contrib.rnn.LSTMCell

(LSTMStateTuple(c=64, h=128), 128)


(TensorShape([Dimension(30), Dimension(64)]),

TensorShape([Dimension(30), Dimension(128)]))

(TensorShape([Dimension(30), Dimension(128)]),

TensorShape([Dimension(30), Dimension(64)]),

TensorShape([Dimension(30), Dimension(128)]))


## 封装了 RNN 的其他组件

Core RNN Cell wrappers (RNNCells that wrap other RNNCells)

• tf.contrib.rnn.MultiRNNCell

• tf.contrib.rnn.LSTMBlockWrapper

• tf.contrib.rnn.DropoutWrapper

• tf.contrib.rnn.EmbeddingWrapper

• tf.contrib.rnn.InputProjectionWrapper

• tf.contrib.rnn.OutputProjectionWrapper

• tf.contrib.rnn.DeviceWrapper

• tf.contrib.rnn.ResidualWrapper

### tf.contrib.rnn.MultiRNNCell

TensorShape([Dimension(32), Dimension(128)])

(TensorShape([Dimension(32), Dimension(64)]),

TensorShape([Dimension(32), Dimension(64)]))

(TensorShape([Dimension(32), Dimension(128)]),

TensorShape([Dimension(32), Dimension(128)]))


### tf.contrib.rnn.DropoutWrapper

(LSTMStateTuple(c=128, h=128), 128)

(128,

(LSTMStateTuple(c=32, h=32),

LSTMStateTuple(c=64, h=64),

LSTMStateTuple(c=128, h=128)))


## tf.nn.dynamic_rnn

TensorShape([Dimension(30), Dimension(10), Dimension(64)])

(LSTMStateTuple(c=32, h=32), LSTMStateTuple(c=64, h=64))


Xie Pan

2018-09-01

2021-06-29