| layer_additive_attention {keras} | R Documentation |
Additive attention layer, a.k.a. Bahdanau-style attention
Description
Additive attention layer, a.k.a. Bahdanau-style attention
Usage
layer_additive_attention(
object,
use_scale = TRUE,
...,
causal = FALSE,
dropout = 0
)
Arguments
object |
What to compose the new
|
use_scale |
If |
... |
standard layer arguments. |
causal |
Boolean. Set to |
dropout |
Float between 0 and 1. Fraction of the units to drop for the attention scores. |
Details
Inputs are query tensor of shape [batch_size, Tq, dim], value tensor of
shape [batch_size, Tv, dim] and key tensor of shape
[batch_size, Tv, dim]. The calculation follows the steps:
Reshape
queryandkeyinto shapes[batch_size, Tq, 1, dim]and[batch_size, 1, Tv, dim]respectively.Calculate scores with shape
[batch_size, Tq, Tv]as a non-linear sum:scores = tf.reduce_sum(tf.tanh(query + key), axis=-1)Use scores to calculate a distribution with shape
[batch_size, Tq, Tv]:distribution = tf$nn$softmax(scores).Use
distributionto create a linear combination ofvaluewith shape[batch_size, Tq, dim]:return tf$matmul(distribution, value).
See Also
-
https://www.tensorflow.org/api_docs/python/tf/keras/layers/AdditiveAttention
-
https://keras.io/api/layers/attention_layers/additive_attention/