Attention wrappers are RNNCell objects that wrap other RNNCell objects and implement attention. The form of attention is determined by a subclass of tf.contrib.seq2seq.AttentionMechanism. These subclasses describe the form of attention (e.g. additive vs. multiplicative) to use when creating the wrapper. An instance of an AttentionMechanism is constructed with a memory tensor, from which lookup keys and values tensors are created.
"""A base AttentionMechanism class providing common functionality. Common functionality includes: 1. Storing the query and memory layers. 2. Preprocessing and storing the memory. """
"""Construct base AttentionMechanism class. Args: query_layer: Callable. Instance of `tf.layers.Layer`. The layer's depth must match the depth of `memory_layer`. If `query_layer` is not provided, the shape of `query` must match that of `memory_layer`. memory: The memory to query; usually the output of an RNN encoder. This tensor should be shaped `[batch_size, max_time, ...]`. probability_fn: A `callable`. Converts the score and previous alignments to probabilities. Its signature should be: `probabilities = probability_fn(score, state)`. memory_sequence_length (optional): Sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths. memory_layer: Instance of `tf.layers.Layer` (may be None). The layer's depth must match the depth of `query_layer`. If `memory_layer` is not provided, the shape of `memory` must match that of `query_layer`. check_inner_dims_defined: Python boolean. If `True`, the `memory` argument's shape is checked to ensure all but the two outermost dimensions are fully defined. score_mask_value: (optional): The mask value for score before passing into `probability_fn`. The default is -inf. Only used if `memory_sequence_length` is not None. name: Name to use when creating ops. """
self.memory_layer(self._values) if self.memory_layer # pylint: disable=not-callable
else self._values)
self._batch_size = (
self._keys.shape[0].value or array_ops.shape(self._keys)[0])
self._alignments_size = (self._keys.shape[1].value or
array_ops.shape(self._keys)[1])
@property
defmemory_layer(self):
return self._memory_layer
@property
defquery_layer(self):
return self._query_layer
@property
defvalues(self):
return self._values
@property
defkeys(self):
return self._keys
@property
defbatch_size(self):
return self._batch_size
@property
defalignments_size(self):
return self._alignments_size
@property
defstate_size(self):
return self._alignments_size
definitial_alignments(self, batch_size, dtype):
"""Creates the initial alignment values for the `AttentionWrapper` class. This is important for AttentionMechanisms that use the previous alignment to calculate the alignment at the next time step (e.g. monotonic attention). The default behavior is to return a tensor of all zeros. Args: batch_size: `int32` scalar, the batch_size. dtype: The `dtype`. Returns: A `dtype` tensor shaped `[batch_size, alignments_size]` (`alignments_size` is the values' `max_time`). """
"""Creates the initial state values for the `AttentionWrapper` class. This is important for AttentionMechanisms that use the previous alignment to calculate the alignment at the next time step (e.g. monotonic attention). The default behavior is to return the same output as initial_alignments. Args: batch_size: `int32` scalar, the batch_size. dtype: The `dtype`. Returns: A structure of all-zero tensors with shapes as described by `state_size`. """
"""Convert to tensor and possibly mask `memory`. Args: memory: `Tensor`, shaped `[batch_size, max_time, ...]`. memory_sequence_length: `int32` `Tensor`, shaped `[batch_size]`. check_inner_dims_defined: Python boolean. If `True`, the `memory` argument's shape is checked to ensure all but the two outermost dimensions are fully defined. Returns: A (possibly masked), checked, new `memory`. Raises: ValueError: If `check_inner_dims_defined` is `True` and not `memory.shape[2:].is_fully_defined()`. """
"""Implements Bahdanau-style (additive) attention. This attention has two forms. The first is Bahdanau attention, as described in: Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio. "Neural Machine Translation by Jointly Learning to Align and Translate." ICLR 2015. https://arxiv.org/abs/1409.0473 The second is the normalized form. This form is inspired by the weight normalization article: Tim Salimans, Diederik P. Kingma. "Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks." https://arxiv.org/abs/1602.07868 To enable the second form, construct the object with parameter `normalize=True`. """
"""Construct the Attention mechanism. Args: num_units: The depth of the query mechanism. memory: The memory to query; usually the output of an RNN encoder. This tensor should be shaped `[batch_size, max_time, ...]`. memory_sequence_length (optional): Sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths. normalize: Python boolean. Whether to normalize the energy term. probability_fn: (optional) A `callable`. Converts the score to probabilities. The default is @{tf.nn.softmax}. Other options include @{tf.contrib.seq2seq.hardmax} and @{tf.contrib.sparsemax.sparsemax}. Its signature should be: `probabilities = probability_fn(score)`. score_mask_value: (optional): The mask value for score before passing into `probability_fn`. The default is -inf. Only used if `memory_sequence_length` is not None. dtype: The data type for the query and memory layers of the attention mechanism. name: Name to use when creating ops. """
"""Score the query based on the keys and values. Args: query: Tensor of dtype matching `self.values` and shape `[batch_size, query_depth]`. state: Tensor of dtype matching `self.values` and shape `[batch_size, alignments_size]` (`alignments_size` is memory's `max_time`). Returns: alignments: Tensor of dtype matching `self.values` and shape `[batch_size, alignments_size]` (`alignments_size` is memory's `max_time`). """
with variable_scope.variable_scope(None, "bahdanau_attention", [query]):
processed_query = self.query_layer(query) if self.query_layer else query
"""Score the query based on the keys and values. Args: query: Tensor of dtype matching `self.values` and shape `[batch_size, query_depth]`. state: Tensor of dtype matching `self.values` and shape `[batch_size, alignments_size]` (`alignments_size` is memory's `max_time`). Returns: alignments: Tensor of dtype matching `self.values` and shape `[batch_size, alignments_size]` (`alignments_size` is memory's `max_time`). """
with variable_scope.variable_scope(None, "luong_attention", [query]):
"""Implements Luong-style (multiplicative) scoring function. Args: query: Tensor, shape `[batch_size, num_units]` to compare to keys. keys: Processed memory, shape `[batch_size, max_time, num_units]`. scale: Whether to apply a scale to the score function. Returns: A `[batch_size, max_time]` tensor of unnormalized score values. Raises: ValueError: If `key` and `query` depths do not match. """
depth = query.get_shape()[-1]
key_units = keys.get_shape()[-1]
if depth != key_units:
raise ValueError(
"Incompatible or unknown inner dimensions between query and keys. "
"Query (%s) has units: %s. Keys (%s) have units: %s. "
"Perhaps you need to set num_units to the keys' dimension (%s)?"
% (query, depth, keys, key_units, key_units))
dtype = query.dtype
# Reshape from [batch_size, depth] to [batch_size, 1, depth]
# for matmul.
query = array_ops.expand_dims(query, 1)
# Inner product along the query units dimension.
# matmul shapes: query is [batch_size, 1, depth] and
# keys is [batch_size, max_time, depth].
# the inner product is asked to **transpose keys' inner shape** to get a