madgrad {sjSDM} | R Documentation |
madgrad
Description
stochastic gradient descent optimizer
Usage
madgrad(momentum = 0.9, weight_decay = 0, eps = 1e-06)
Arguments
momentum |
strength of momentum |
weight_decay |
l2 penalty on weights |
eps |
epsilon |
Value
Anonymous function that returns optimizer when called.
References
Defazio, A., & Jelassi, S. (2021). Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization. arXiv preprint arXiv:2101.11075.
[Package sjSDM version 1.0.5 Index]