R: k-folds cross-validation for regnet

cv.regnet {regnet}

R Documentation

k-folds cross-validation for regnet

Description

This function does k-fold cross-validation for regnet and returns the optimal value(s) of lambda.

Usage

cv.regnet(
  X,
  Y,
  response = c("binary", "continuous", "survival"),
  penalty = c("network", "mcp", "lasso"),
  lamb.1 = NULL,
  lamb.2 = NULL,
  folds = 5,
  r = NULL,
  clv = NULL,
  initiation = NULL,
  alpha.i = 1,
  robust = FALSE,
  verbo = FALSE,
  debugging = FALSE
)

Arguments

`X`	X matrix without intercept (see `regnet`).
`Y`	the response variable Y (see `regnet`).
`response`	the response type. regnet supports three types of response: "binary", "continuous" and "survival".
`penalty`	the penalty type. regnet provides three choices for the penalty function: "network", "mcp" and "lasso".
`lamb.1`	a user-supplied sequence of `\lambda_{1}` values, which serves as a tuning parameter to impose sparsity. If it is left as NULL, regnet will compute its own sequence.
`lamb.2`	a user-supplied sequence of `\lambda_{2}` values for network method. `\lambda_{2}` controls the smoothness among coefficient profiles. If it is left as NULL, a default sequence will be used.
`folds`	the number of folds for cross-validation; the default is 5.
`r`	the regularization parameter in MCP; default is 5. For binary response, r should be larger than 4.
`clv`	a value or a vector, indexing variables that are not subject to penalty. clv only works for continuous and survival responses in the current version of regnet, and will be ignored for other types of responses.
`initiation`	the method for initiating the coefficient vector. The default method is elastic-net.
`alpha.i`	the elastic-net mixing parameter. The program can use the elastic-net for choosing initial values of the coefficient vector. alpha.i is the elastic-net mixing parameter, with 0 `\le` alpha.i `\le` 1. alpha.i=1 is the lasso penalty, and alpha.i=0 is the ridge penalty. If the user chooses a method other than elastic-net for initializing coefficients, alpha.i will be ignored.
`robust`	a logical flag. Whether or not to use robust methods. Robust methods are available for survival and continuous response.
`verbo`	output progress to the console.
`debugging`	a logical flag. If TRUE, extra information will be returned.

Details

When lamb.1 is left as NULL, regnet computes its own sequence. You can find the lamb.1 sequence used by the program in the returned CVM matrix (see the 'Value' section). If you find the default sequence does not work well, you can try (1) standardizing the response vector Y; or (2) providing a customized lamb.1 sequence for your data.

Sometimes multiple optimal values(pairs) of lambda(s) can be found (see 'Value'). This is usually normal when the response is binary. However, if the response is survival or continuous, you may want to check (1) if the sequence of lambda is too large (i.e. all coefficients are shrunken to zero under all values of lambda) ; or (2) if the sequence is too small (i.e. all coefficients are non-zero under all values of lambda). If neither, simply choose the value(pair) of lambda based on your preference.

Value

an object of class "cv.regnet" is returned, which is a list with components:

`lambda`	the optimal value(s) of `\lambda`. More than one value will be returned, if multiple lambdas have the cross-validated error = min(cross-validated errors). If the network penalty is used, lambda contains optimal pair(s) of `\lambda_{1}` and `\lambda_{2}`.
`mcvm`	the cross-validated error of the optimal `\lambda`. For binary response, the error is the misclassification rate. For continuous response, mean squared error (MSE) is used. For survival response, the MSE is used for non-robust methods, and the criterion for robust methods is the least absolute deviation (LAD).
`CVM`	a matrix of the mean cross-validated errors of all lambdas used in the fits. The row names of CVM are the values of `\lambda_{1}`. If the network penalty was used, the column names are the values of `\lambda_{2}`.

References

Ren, J., Du, Y., Li, S., Ma, S., Jiang,Y. and Wu, C. (2019). Robust network-based regularization and variable selection for high dimensional genomics data in cancer prognosis. Genet. Epidemiol., 43:276-291 doi:10.1002/gepi.22194

Ren, J., He, T., Li, Y., Liu, S., Du, Y., Jiang, Y., and Wu, C. (2017). Network-based regularization for high dimensional SNP data in the case-control study of Type 2 diabetes. BMC Genetics, 18(1):44 doi:10.1186/s12863-017-0495-5

Examples


## Binary response using network method
data(LogisticExample)
X = rgn.logi$X
Y = rgn.logi$Y
out = cv.regnet(X, Y, response="binary", penalty="network", folds=5, r = 4.5)
out$lambda
fit = regnet(X, Y, "binary", "network", out$lambda[1,1], out$lambda[1,2], r = 4.5)
index = which(rgn.logi$beta != 0)
pos = which(fit$coeff != 0)
tp = length(intersect(index, pos))
fp = length(pos) - tp
list(tp=tp, fp=fp)

## Binary response using MCP method
out = cv.regnet(X, Y, response="binary", penalty="mcp", folds=5, r = 4.5)
out$lambda
fit = regnet(X, Y, "binary", "mcp", out$lambda[1], r = 4.5)
index = which(rgn.logi$beta != 0)
pos = which(fit$coeff != 0)
tp = length(intersect(index, pos))
fp = length(pos) - tp
list(tp=tp, fp=fp)

[Package regnet version 1.0.1 Index]