optimizer_radam {tfaddons} | R Documentation |
Rectified Adam (a.k.a. RAdam)
Description
Rectified Adam (a.k.a. RAdam)
Usage
optimizer_radam(
learning_rate = 0.001,
beta_1 = 0.9,
beta_2 = 0.999,
epsilon = 1e-07,
weight_decay = 0,
amsgrad = FALSE,
sma_threshold = 5,
total_steps = 0,
warmup_proportion = 0.1,
min_lr = 0,
name = "RectifiedAdam",
clipnorm = NULL,
clipvalue = NULL,
decay = NULL,
lr = NULL
)
Arguments
learning_rate |
A 'Tensor' or a floating point value. or a schedule that is a 'tf$keras$optimizers$schedules$LearningRateSchedule' The learning rate. |
beta_1 |
A float value or a constant float tensor. The exponential decay rate for the 1st moment estimates. |
beta_2 |
A float value or a constant float tensor. The exponential decay rate for the 2nd moment estimates. |
epsilon |
A small constant for numerical stability. |
weight_decay |
A floating point value. Weight decay for each param. |
amsgrad |
boolean. Whether to apply AMSGrad variant of this algorithm from the paper "On the Convergence of Adam and beyond". |
sma_threshold |
A float value. The threshold for simple mean average. |
total_steps |
An integer. Total number of training steps. Enable warmup by setting a positive value. |
warmup_proportion |
A floating point value. The proportion of increasing steps. |
min_lr |
A floating point value. Minimum learning rate after warmup. |
name |
Optional name for the operations created when applying gradients. Defaults to "RectifiedAdam". |
clipnorm |
is clip gradients by norm. |
clipvalue |
is clip gradients by value. |
decay |
is included for backward compatibility to allow time inverse decay of learning rate. |
lr |
is included for backward compatibility, recommended to use learning_rate instead. |
Value
Optimizer for use with 'keras::compile()'