optimizer_adagrad {keras3} | R Documentation |
Optimizer that implements the Adagrad algorithm.
Description
Adagrad is an optimizer with parameter-specific learning rates, which are adapted relative to how frequently a parameter gets updated during training. The more updates a parameter receives, the smaller the updates.
Usage
optimizer_adagrad(
learning_rate = 0.001,
initial_accumulator_value = 0.1,
epsilon = 1e-07,
weight_decay = NULL,
clipnorm = NULL,
clipvalue = NULL,
global_clipnorm = NULL,
use_ema = FALSE,
ema_momentum = 0.99,
ema_overwrite_frequency = NULL,
name = "adagrad",
...,
loss_scale_factor = NULL,
gradient_accumulation_steps = NULL
)
Arguments
learning_rate |
A float, a
|
initial_accumulator_value |
Floating point value. Starting value for the accumulators (per-parameter momentum values). Must be non-negative. |
epsilon |
Small floating point value for maintaining numerical stability. |
weight_decay |
Float. If set, weight decay is applied. |
clipnorm |
Float. If set, the gradient of each weight is individually clipped so that its norm is no higher than this value. |
clipvalue |
Float. If set, the gradient of each weight is clipped to be no higher than this value. |
global_clipnorm |
Float. If set, the gradient of all weights is clipped so that their global norm is no higher than this value. |
use_ema |
Boolean, defaults to |
ema_momentum |
Float, defaults to 0.99. Only used if |
ema_overwrite_frequency |
Int or |
name |
String. The name to use for momentum accumulator weights created by the optimizer. |
... |
For forward/backward compatability. |
loss_scale_factor |
Float or |
gradient_accumulation_steps |
Int or |
Value
an Optimizer
instance
Reference
See Also
Other optimizers:
optimizer_adadelta()
optimizer_adafactor()
optimizer_adam()
optimizer_adam_w()
optimizer_adamax()
optimizer_ftrl()
optimizer_lion()
optimizer_loss_scale()
optimizer_nadam()
optimizer_rmsprop()
optimizer_sgd()