f2sSP_cv {fdaSP} | R Documentation |
Cross-validation for Overlap Group Least Absolute Shrinkage and Selection Operator on scalar-on-function regression model
Description
Overlap Group-LASSO for scalar-on-function regression model solves the following optimization problem
\textrm{min}_{\psi,\gamma} ~ \frac{1}{2} \sum_{i=1}^n \left( y_i - \int x_i(t) \psi(t) dt-z_i^\intercal\gamma \right)^2 + \lambda \sum_{g=1}^{G} \Vert S_{g}T\psi \Vert_2
to obtain a sparse coefficient vector \psi\in\mathbb{R}^{M}
for the functional penalized predictor x(t)
and a coefficient vector \gamma\in\mathbb{R}^q
for the unpenalized scalar predictors z_1,\dots,z_q
. The regression function is \psi(t)=\varphi(t)^\intercal\psi
where \varphi(t)
is a B-spline basis of order d
and dimension M
.
For each group g
, each row of the matrix S_g\in\mathbb{R}^{d\times M}
has non-zero entries only for those bases belonging
to that group. These values are provided by the arguments groups
and group_weights
(see below).
Each basis function belongs to more than one group. The diagonal matrix T\in\mathbb{R}^{M\times M}
contains
the basis specific weights. These values are provided by the argument var_weights
(see below).
The regularization path is computed for the overlap group-LASSO penalty at a grid of values for the regularization
parameter \lambda
using the alternating direction method of multipliers (ADMM). See Boyd et al. (2011) and Lin et al. (2022)
for details on the ADMM method.
Usage
f2sSP_cv(
vY,
mX,
mZ = NULL,
M,
group_weights = NULL,
var_weights = NULL,
standardize.data = FALSE,
splOrd = 4,
lambda = NULL,
lambda.min.ratio = NULL,
nlambda = NULL,
cv.fold = 5,
intercept = FALSE,
overall.group = FALSE,
control = list()
)
Arguments
vY |
a length- |
mX |
a |
mZ |
an |
M |
number of elements of the B-spline basis vector |
group_weights |
a vector of length |
var_weights |
a vector of length |
standardize.data |
logical. Should data be standardized? |
splOrd |
the order |
lambda |
either a regularization parameter or a vector of regularization parameters. In this latter case the routine computes the whole path. If it is NULL values for lambda are provided by the routine. |
lambda.min.ratio |
smallest value for lambda, as a fraction of the maximum lambda value. If |
nlambda |
the number of lambda values - default is 30. |
cv.fold |
the number of folds - default is 5. |
intercept |
logical. If it is TRUE, a column of ones is added to the design matrix. |
overall.group |
logical. If it is TRUE, an overall group including all penalized covariates is added. |
control |
a list of control parameters for the ADMM algorithm. See ‘Details’. |
Value
A named list containing
- sp.coefficients
a length-
M
solution vector solution vector for the parameters\psi
, which corresponds to the minimum cross-validated MSE.- sp.fun
a length-
r
vector providing the estimated functional coefficient for\psi(t)
corresponding to the minimum cross-validated MSE.- coefficients
a length-
q
solution vector for the parameters\gamma
, which corresponds to the minimum cross-validated MSE. It is provided only when either the matrixZ
in input is not NULL or the intercept is set to TRUE.- lambda
sequence of lambda.
- lambda.min
value of lambda that attains the minimum cross-validated MSE.
- mse
cross-validated mean squared error.
- min.mse
minimum value of the cross-validated MSE for the sequence of lambda.
- convergence
logical. 1 denotes achieved convergence.
- elapsedTime
elapsed time in seconds.
- iternum
number of iterations.
Iteration stops when both r_norm
and s_norm
values
become smaller than eps_pri
and eps_dual
, respectively.
Details
The control argument is a list that can supply any of the following components:
- adaptation
logical. If it is TRUE, ADMM with adaptation is performed. The default value is TRUE. See Boyd et al. (2011) for details.
- rho
an augmented Lagrangian parameter. The default value is 1.
- tau.ada
an adaptation parameter greater than one. Only needed if adaptation = TRUE. The default value is 2. See Boyd et al. (2011) and Lin et al. (2022) for details.
- mu.ada
an adaptation parameter greater than one. Only needed if adaptation = TRUE. The default value is 10. See Boyd et al. (2011) and Lin et al. (2022) for details.
- abstol
absolute tolerance stopping criterion. The default value is sqrt(sqrt(.Machine$double.eps)).
- reltol
relative tolerance stopping criterion. The default value is sqrt(.Machine$double.eps).
- maxit
maximum number of iterations. The default value is 100.
- print.out
logical. If it is TRUE, a message about the procedure is printed. The default value is TRUE.
References
Bernardi M, Canale A, Stefanucci M (2022). “Locally Sparse Function-on-Function Regression.” Journal of Computational and Graphical Statistics, 0(0), 1-15. doi:10.1080/10618600.2022.2130926, https://doi.org/10.1080/10618600.2022.2130926.
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011). “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers.” Foundations and Trends® in Machine Learning, 3(1), 1-122. ISSN 1935-8237, doi:10.1561/2200000016, http://dx.doi.org/10.1561/2200000016.
Jenatton R, Audibert J, Bach F (2011). “Structured variable selection with sparsity-inducing norms.” J. Mach. Learn. Res., 12, 2777–2824. ISSN 1532-4435.
Lin Z, Li H, Fang C (2022). Alternating direction method of multipliers for machine learning. Springer, Singapore. ISBN 978-981-16-9839-2; 978-981-16-9840-8, doi:10.1007/978-981-16-9840-8, With forewords by Zongben Xu and Zhi-Quan Luo.
Examples
## generate sample data and functional coefficients
set.seed(1)
n <- 40
p <- 18
r <- 100
s <- seq(0, 1, length.out = r)
beta_basis <- splines::bs(s, df = p, intercept = TRUE) # basis
coef_data <- matrix(rnorm(n*floor(p/2)), n, floor(p/2))
fun_data <- coef_data %*% t(splines::bs(s, df = floor(p/2), intercept = TRUE))
x_0 <- apply(matrix(rnorm(p, sd=1),p,1), 1, fdaSP::softhresh, 1)
x_fun <- beta_basis %*% x_0
b <- fun_data %*% x_fun + rnorm(n, sd = sqrt(crossprod(fun_data %*% x_fun ))/10)
l <- 10^seq(2, -4, length.out = 30)
maxit <- 1000
## set the hyper-parameters
maxit <- 1000
rho_adaptation <- TRUE
rho <- 1
reltol <- 1e-5
abstol <- 1e-5
## run cross-validation
mod_cv <- f2sSP_cv(vY = b, mX = fun_data, M = p,
group_weights = NULL, var_weights = NULL, standardize.data = FALSE, splOrd = 4,
lambda = NULL, lambda.min = 1e-5, nlambda = 30, cv.fold = 5, intercept = FALSE,
control = list("abstol" = abstol,
"reltol" = reltol,
"adaptation" = rho_adaptation,
"rho" = rho,
"print.out" = FALSE))
### graphical presentation
plot(log(mod_cv$lambda), mod_cv$mse, type = "l", col = "blue", lwd = 2, bty = "n",
xlab = latex2exp::TeX("$\\log(\\lambda)$"), ylab = "Prediction Error",
ylim = range(mod_cv$mse - mod_cv$mse.sd, mod_cv$mse + mod_cv$mse.sd),
main = "Cross-validated Prediction Error")
fdaSP::confband(xV = log(mod_cv$lambda), yVmin = mod_cv$mse - mod_cv$mse.sd,
yVmax = mod_cv$mse + mod_cv$mse.sd)
abline(v = log(mod_cv$lambda[which(mod_cv$lambda == mod_cv$lambda.min)]),
col = "red", lwd = 1.0)
### comparison with oracle error
mod <- f2sSP(vY = b, mX = fun_data, M = p,
group_weights = NULL, var_weights = NULL,
standardize.data = FALSE, splOrd = 4,
lambda = NULL, nlambda = 30,
lambda.min = 1e-5, intercept = FALSE,
control = list("abstol" = abstol,
"reltol" = reltol,
"adaptation" = rho_adaptation,
"rho" = rho,
"print.out" = FALSE))
err_mod <- apply(mod$sp.coef.path, 1, function(x) sum((x - x_0)^2))
plot(log(mod$lambda), err_mod, type = "l", col = "blue",
lwd = 2, xlab = latex2exp::TeX("$\\log(\\lambda)$"),
ylab = "Estimation Error", main = "True Estimation Error", bty = "n")
abline(v = log(mod$lambda[which(err_mod == min(err_mod))]), col = "red", lwd = 1.0)
abline(v = log(mod_cv$lambda[which(mod_cv$lambda == mod_cv$lambda.min)]),
col = "red", lwd = 1.0, lty = 2)