optimizer_adamax {keras3} | R Documentation |
Optimizer that implements the Adamax algorithm.
Description
Adamax, a variant of Adam based on the infinity norm, is a first-order gradient-based optimization method. Due to its capability of adjusting the learning rate based on data characteristics, it is suited to learn time-variant process, e.g., speech data with dynamically changed noise conditions. Default parameters follow those provided in the paper (see references below).
Initialization:
m <- 0 # Initialize initial 1st moment vector u <- 0 # Initialize the exponentially weighted infinity norm t <- 0 # Initialize timestep
The update rule for parameter w
with gradient g
is described at the end
of section 7.1 of the paper (see the referenece section):
t <- t + 1 m <- beta1 * m + (1 - beta) * g u <- max(beta2 * u, abs(g)) current_lr <- learning_rate / (1 - beta1 ** t) w <- w - current_lr * m / (u + epsilon)
Usage
optimizer_adamax(
learning_rate = 0.001,
beta_1 = 0.9,
beta_2 = 0.999,
epsilon = 1e-07,
weight_decay = NULL,
clipnorm = NULL,
clipvalue = NULL,
global_clipnorm = NULL,
use_ema = FALSE,
ema_momentum = 0.99,
ema_overwrite_frequency = NULL,
name = "adamax",
...,
loss_scale_factor = NULL,
gradient_accumulation_steps = NULL
)
Arguments
learning_rate |
A float, a
|
beta_1 |
A float value or a constant float tensor. The exponential decay rate for the 1st moment estimates. |
beta_2 |
A float value or a constant float tensor. The exponential decay rate for the exponentially weighted infinity norm. |
epsilon |
A small constant for numerical stability. name: String. The name to use for momentum accumulator weights created by the optimizer. |
weight_decay |
Float. If set, weight decay is applied. |
clipnorm |
Float. If set, the gradient of each weight is individually clipped so that its norm is no higher than this value. |
clipvalue |
Float. If set, the gradient of each weight is clipped to be no higher than this value. |
global_clipnorm |
Float. If set, the gradient of all weights is clipped so that their global norm is no higher than this value. |
use_ema |
Boolean, defaults to |
ema_momentum |
Float, defaults to 0.99. Only used if |
ema_overwrite_frequency |
Int or NULL, defaults to NULL. Only used if
|
name |
String, name for the object |
... |
For forward/backward compatability. |
loss_scale_factor |
Float or |
gradient_accumulation_steps |
Int or |
Value
an Optimizer
instance
Reference
See Also
Other optimizers:
optimizer_adadelta()
optimizer_adafactor()
optimizer_adagrad()
optimizer_adam()
optimizer_adam_w()
optimizer_ftrl()
optimizer_lion()
optimizer_loss_scale()
optimizer_nadam()
optimizer_rmsprop()
optimizer_sgd()