layer_attention {keras} | R Documentation |
Dot-product attention layer, a.k.a. Luong-style attention
Description
Dot-product attention layer, a.k.a. Luong-style attention
Usage
layer_attention(
inputs,
use_scale = FALSE,
score_mode = "dot",
...,
dropout = NULL
)
Arguments
inputs |
List of the following tensors:
|
use_scale |
If |
score_mode |
Function to use to compute attention scores, one of
|
... |
standard layer arguments (e.g., batch_size, dtype, name, trainable, weights) |
dropout |
Float between 0 and 1. Fraction of the units to drop for the attention scores. Defaults to 0.0. |
Details
inputs are query
tensor of shape [batch_size, Tq, dim]
, value
tensor
of shape [batch_size, Tv, dim]
and key
tensor of shape
[batch_size, Tv, dim]
. The calculation follows the steps:
Calculate scores with shape
[batch_size, Tq, Tv]
as aquery
-key
dot product:scores = tf$matmul(query, key, transpose_b=TRUE)
.Use scores to calculate a distribution with shape
[batch_size, Tq, Tv]
:distribution = tf$nn$softmax(scores)
.Use
distribution
to create a linear combination ofvalue
with shape[batch_size, Tq, dim]
: returntf$matmul(distribution, value)
.
See Also
Other core layers:
layer_activation()
,
layer_activity_regularization()
,
layer_dense()
,
layer_dense_features()
,
layer_dropout()
,
layer_flatten()
,
layer_input()
,
layer_lambda()
,
layer_masking()
,
layer_permute()
,
layer_repeat_vector()
,
layer_reshape()