optimizer_nadam {keras3} | R Documentation |
Optimizer that implements the Nadam algorithm.
Description
Much like Adam is essentially RMSprop with momentum, Nadam is Adam with Nesterov momentum.
Usage
optimizer_nadam(
learning_rate = 0.001,
beta_1 = 0.9,
beta_2 = 0.999,
epsilon = 1e-07,
weight_decay = NULL,
clipnorm = NULL,
clipvalue = NULL,
global_clipnorm = NULL,
use_ema = FALSE,
ema_momentum = 0.99,
ema_overwrite_frequency = NULL,
name = "nadam",
...,
loss_scale_factor = NULL,
gradient_accumulation_steps = NULL
)
Arguments
learning_rate |
A float, a
|
beta_1 |
A float value or a constant float tensor, or a callable
that takes no arguments and returns the actual value to use. The
exponential decay rate for the 1st moment estimates.
Defaults to |
beta_2 |
A float value or a constant float tensor, or a callable
that takes no arguments and returns the actual value to use. The
exponential decay rate for the 2nd moment estimates. Defaults to
|
epsilon |
A small constant for numerical stability. This epsilon is
"epsilon hat" in the Kingma and Ba paper (in the formula just before
Section 2.1), not the epsilon in Algorithm 1 of the paper.
Defaults to |
weight_decay |
Float. If set, weight decay is applied. |
clipnorm |
Float. If set, the gradient of each weight is individually clipped so that its norm is no higher than this value. |
clipvalue |
Float. If set, the gradient of each weight is clipped to be no higher than this value. |
global_clipnorm |
Float. If set, the gradient of all weights is clipped so that their global norm is no higher than this value. |
use_ema |
Boolean, defaults to |
ema_momentum |
Float, defaults to 0.99. Only used if |
ema_overwrite_frequency |
Int or |
name |
String. The name to use for momentum accumulator weights created by the optimizer. |
... |
For forward/backward compatability. |
loss_scale_factor |
Float or |
gradient_accumulation_steps |
Int or |
Value
an Optimizer
instance
Reference
See Also
Other optimizers:
optimizer_adadelta()
optimizer_adafactor()
optimizer_adagrad()
optimizer_adam()
optimizer_adam_w()
optimizer_adamax()
optimizer_ftrl()
optimizer_lion()
optimizer_loss_scale()
optimizer_rmsprop()
optimizer_sgd()