optim_qhadam {torchopt} | R Documentation |
QHAdam optimization algorithm
Description
R implementation of the QHAdam optimizer proposed by Ma and Yarats(2019). We used the implementation available at https://github.com/jettify/pytorch-optimizer/blob/master/torch_optimizer/qhadam.py. Thanks to Nikolay Novik for providing the pytorch code.
The original implementation has been developed by Facebook AI and is licensed using the MIT license.
From the the paper by Ma and Yarats(2019): QHAdam is a QH augmented version of Adam, where we replace both of Adam's moment estimators with quasi-hyperbolic terms. QHAdam decouples the momentum term from the current gradient when updating the weights, and decouples the mean squared gradients term from the current squared gradient when updating the weights.
Usage
optim_qhadam(
params,
lr = 0.01,
betas = c(0.9, 0.999),
eps = 0.001,
nus = c(1, 1),
weight_decay = 0,
decouple_weight_decay = FALSE
)
Arguments
params |
List of parameters to optimize. |
lr |
Learning rate (default: 1e-3) |
betas |
Coefficients computing running averages of gradient and its square (default: (0.9, 0.999)) |
eps |
Term added to the denominator to improve numerical stability (default: 1e-8) |
nus |
Immediate discount factors used to estimate the gradient and its square (default: (1.0, 1.0)) |
weight_decay |
Weight decay (L2 penalty) (default: 0) |
decouple_weight_decay |
Whether to decouple the weight decay from the gradient-based optimization step. |
Value
A torch optimizer object implementing the step
method.
Author(s)
Gilberto Camara, gilberto.camara@inpe.br
Daniel Falbel, daniel.falble@gmail.com
Rolf Simoes, rolf.simoes@inpe.br
Felipe Souza, lipecaso@gmail.com
Alber Sanchez, alber.ipia@inpe.br
References
Jerry Ma, Denis Yarats, "Quasi-hyperbolic momentum and Adam for deep learning". https://arxiv.org/abs/1810.06801
Examples
if (torch::torch_is_installed()) {
# function to demonstrate optimization
beale <- function(x, y) {
log((1.5 - x + x * y)^2 + (2.25 - x - x * y^2)^2 + (2.625 - x + x * y^3)^2)
}
# define optimizer
optim <- torchopt::optim_qhadam
# define hyperparams
opt_hparams <- list(lr = 0.01)
# starting point
x0 <- 3
y0 <- 3
# create tensor
x <- torch::torch_tensor(x0, requires_grad = TRUE)
y <- torch::torch_tensor(y0, requires_grad = TRUE)
# instantiate optimizer
optim <- do.call(optim, c(list(params = list(x, y)), opt_hparams))
# run optimizer
steps <- 400
x_steps <- numeric(steps)
y_steps <- numeric(steps)
for (i in seq_len(steps)) {
x_steps[i] <- as.numeric(x)
y_steps[i] <- as.numeric(y)
optim$zero_grad()
z <- beale(x, y)
z$backward()
optim$step()
}
print(paste0("starting value = ", beale(x0, y0)))
print(paste0("final value = ", beale(x_steps[steps], y_steps[steps])))
}