R: madgrad

madgrad {sjSDM}

R Documentation

madgrad

Description

stochastic gradient descent optimizer

Usage

madgrad(momentum = 0.9, weight_decay = 0, eps = 1e-06)

Arguments

`momentum`	strength of momentum
`weight_decay`	l2 penalty on weights
`eps`	epsilon

Value

Anonymous function that returns optimizer when called.

References

Defazio, A., & Jelassi, S. (2021). Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization. arXiv preprint arXiv:2101.11075.

[Package sjSDM version 1.0.5 Index]