madgrad {sjSDM}R Documentation

madgrad

Description

stochastic gradient descent optimizer

Usage

madgrad(momentum = 0.9, weight_decay = 0, eps = 1e-06)

Arguments

momentum

strength of momentum

weight_decay

l2 penalty on weights

eps

epsilon

Value

Anonymous function that returns optimizer when called.

References

Defazio, A., & Jelassi, S. (2021). Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization. arXiv preprint arXiv:2101.11075.


[Package sjSDM version 1.0.5 Index]