R: cv.ncpen: cross validation for 'ncpen'

cv.ncpen {ncpen}

R Documentation

cv.ncpen: cross validation for `ncpen`

Description

performs k-fold cross-validation (CV) for nonconvex penalized regression models over a sequence of the regularization parameter lambda.

Usage

cv.ncpen(y.vec, x.mat, family = c("gaussian", "linear", "binomial",
  "logit", "poisson", "multinomial", "cox"), penalty = c("scad", "mcp",
  "tlp", "lasso", "classo", "ridge", "sridge", "mbridge", "mlog"),
  x.standardize = TRUE, intercept = TRUE, lambda = NULL,
  n.lambda = NULL, r.lambda = NULL, w.lambda = NULL, gamma = NULL,
  tau = NULL, alpha = NULL, df.max = 50, cf.max = 100,
  proj.min = 10, add.max = 10, niter.max = 30, qiter.max = 10,
  aiter.max = 100, b.eps = 1e-06, k.eps = 1e-04, c.eps = 1e-06,
  cut = TRUE, local = FALSE, local.initial = NULL, n.fold = 10,
  fold.id = NULL)

Arguments

`y.vec`	(numeric vector) response vector. Must be 0,1 for `binomial` and 1,2,..., for `multinomial`.
`x.mat`	(numeric matrix) design matrix without intercept. The censoring indicator must be included at the last column of the design matrix for `cox`.
`family`	(character) regression model. Supported models are `gaussian` (or `linear`), `binomial` (or `logit`), `poisson`, `multinomial`, and `cox`. Default is `gaussian`.
`penalty`	(character) penalty function. Supported penalties are `scad` (smoothly clipped absolute deviation), `mcp` (minimax concave penalty), `tlp` (truncated LASSO penalty), `lasso` (least absolute shrinkage and selection operator), `classo` (clipped lasso = mcp + lasso), `ridge` (ridge), `sridge` (sparse ridge = mcp + ridge), `mbridge` (modified bridge) and `mlog` (modified log). Default is `scad`.
`x.standardize`	(logical) whether to standardize `x.mat` prior to fitting the model (see details). The estimated coefficients are always restored to the original scale.
`intercept`	(logical) whether to include an intercept in the model.
`lambda`	(numeric vector) user-specified sequence of `lambda` values. Default is supplied automatically from samples.
`n.lambda`	(numeric) the number of `lambda` values. Default is 100.
`r.lambda`	(numeric) ratio of the smallest `lambda` value to largest. Default is 0.001 when n>p, and 0.01 for other cases.
`w.lambda`	(numeric vector) penalty weights for each coefficient (see references). If a penalty weight is set to 0, the corresponding coefficient is always nonzero.
`gamma`	(numeric) additional tuning parameter for controlling shrinkage effect of `classo` and `sridge` (see references). Default is half of the smallest `lambda`.
`tau`	(numeric) concavity parameter of the penalties (see reference). Default is 3.7 for `scad`, 2.1 for `mcp`, `classo` and `sridge`, 0.001 for `tlp`, `mbridge` and `mlog`.
`alpha`	(numeric) ridge effect (weight between the penalty and ridge penalty) (see details). Default value is 1. If penalty is `ridge` and `sridge` then `alpha` is set to 0.
`df.max`	(numeric) the maximum number of nonzero coefficients.
`cf.max`	(numeric) the maximum of absolute value of nonzero coefficients.
`proj.min`	(numeric) the projection cycle inside CD algorithm (largely internal use. See details).
`add.max`	(numeric) the maximum number of variables added in CCCP iterations (largely internal use. See references).
`niter.max`	(numeric) maximum number of iterations in CCCP.
`qiter.max`	(numeric) maximum number of quadratic approximations in each CCCP iteration.
`aiter.max`	(numeric) maximum number of iterations in CD algorithm.
`b.eps`	(numeric) convergence threshold for coefficients vector.
`k.eps`	(numeric) convergence threshold for KKT conditions.
`c.eps`	(numeric) convergence threshold for KKT conditions (largely internal use).
`cut`	(logical) convergence threshold for KKT conditions (largely internal use).
`local`	(logical) whether to use local initial estimator for path construction. It may take a long time.
`local.initial`	(numeric vector) initial estimator for `local=TRUE`.
`n.fold`	(numeric) number of folds for CV.
`fold.id`	(numeric vector) fold ids from 1 to k that indicate fold configuration.

Details

Two kinds of CV errors are returned: root mean squared error and negative log likelihood. The results depends on the random partition made internally. To choose an optimal coefficients form the cv results, use coef.cv.ncpen. ncpen does not search values of gamma, tau and alpha.

Value

An object with S3 class cv.ncpen.

`ncpen.fit`	ncpen object fitted from the whole samples.
`fold.index`	fold ids of the samples.
`rmse`	rood mean squared errors from CV.
`like`	negative log-likelihoods from CV.
`lambda`	sequence of `lambda` used for CV.

Author(s)

Dongshin Kim, Sunghoon Kwon, Sangin Lee

References

Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96, 1348-60. Zhang, C.H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of statistics, 38(2), 894-942. Shen, X., Pan, W., Zhu, Y. and Zhou, H. (2013). On constrained and regularized high-dimensional regression. Annals of the Institute of Statistical Mathematics, 65(5), 807-832. Kwon, S., Lee, S. and Kim, Y. (2016). Moderately clipped LASSO. Computational Statistics and Data Analysis, 92C, 53-67. Kwon, S. Kim, Y. and Choi, H.(2013). Sparse bridge estimation with a diverging number of parameters. Statistics and Its Interface, 6, 231-242. Huang, J., Horowitz, J.L. and Ma, S. (2008). Asymptotic properties of bridge estimators in sparse high-dimensional regression models. The Annals of Statistics, 36(2), 587-613. Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Annals of statistics, 36(4), 1509. Lee, S., Kwon, S. and Kim, Y. (2016). A modified local quadratic approximation algorithm for penalized optimization problems. Computational Statistics and Data Analysis, 94, 275-286.

Examples

### linear regression with scad penalty
sam =  sam.gen.ncpen(n=200,p=10,q=5,cf.min=0.5,cf.max=1,corr=0.5,family="gaussian")
x.mat = sam$x.mat; y.vec = sam$y.vec
fit = cv.ncpen(y.vec=y.vec,x.mat=x.mat,n.lambda=10,family="gaussian", penalty="scad")
coef(fit)

[Package ncpen version 1.0.0 Index]

cv.ncpen: cross validation for ncpen