R: Cross-validation for sparse group penalties

sgp.cv {SGPR}

R Documentation

Cross-validation for sparse group penalties

Description

A function that performs k-fold cross-validation for sparse group penalties for a lambda sequence.

Usage

sgp.cv(
  X,
  y,
  group = 1:ncol(X),
  Z = NULL,
  ...,
  nfolds = 10,
  seed,
  fold,
  type,
  returnY = FALSE,
  print.trace = FALSE
)

Arguments

`X`	The design matrix without intercept with the variables to be selected.
`y`	The response vector.
`group`	A vector indicating the group membership of each variable in X.
`Z`	The design matrix of the variables to be included in the model without penalization.
`...`	Other parameters of underlying basic functions.
`nfolds`	The number of folds for cross-validation.
`seed`	A seed provided by the user for the random number generator.
`fold`	A vector of folds specified by the user (default is a random assignment).
`type`	A string indicating the type of regression model (linear or binomial).
`returnY`	A Boolean value indicating whether the fitted values should be returned.
`print.trace`	A Boolean value that specifies whether the beginning of a fold should be printed.

Value

A list containing:

cve: The average cross-validation error for each value of lambda.
cvse: The estimated standard error for each value of cve.
lambdas: The sequence of lambda values.
fit: The sparse group penalty model fitted to the entire data.
fold: The fold assignments for each observation for the cross-validation procedure.
min: The index of lambda corresponding to the minimum cross-validation error.
lambda.min: The value of lambda with the minimum cross-validation error.
null.dev: The deviance for the empty model.
pe: The cross-validation prediction error for each value of lambda (for binomial only).
pred: The fitted values from the cross-validation folds.

Examples


# Generate data
 n <- 100
 p <- 200
 nr <- 10
 g <- ceiling(1:p / nr)
 X <- matrix(rnorm(n * p), n, p)
 b <- c(-3:3)
 y_lin <- X[, 1:length(b)] %*% b + 5 * rnorm(n)
 y_log <- rbinom(n, 1, exp(y_lin) / (1 + exp(y_lin)))

# Linear regression
 lin_fit <- sgp.cv(X, y_lin, g, type = "linear", penalty = "sgl")
 plot(lin_fit)
 predict(lin_fit, extract = "vars")
 lin_fit <- sgp.cv(X, y_lin, g, type = "linear", penalty = "sgs")
 plot(lin_fit)
 predict(lin_fit, extract = "vars")
 lin_fit <- sgp.cv(X, y_lin, g, type = "linear", penalty = "sgm")
 plot(lin_fit)
 predict(lin_fit, extract = "vars")
 lin_fit <- sgp.cv(X, y_lin, g, type = "linear", penalty = "sge")
 plot(lin_fit)
 predict(lin_fit, extract = "vars")

# Logistic regression
 log_fit <- sgp.cv(X, y_log, g, type = "logit", penalty = "sgl")
 plot(log_fit)
 predict(log_fit, extract = "vars")
 log_fit <- sgp.cv(X, y_log, g, type = "logit", penalty = "sgs")
 plot(log_fit)
 predict(log_fit, extract = "vars")
 log_fit <- sgp.cv(X, y_log, g, type = "logit", penalty = "sgm")
 plot(log_fit)
 predict(log_fit, extract = "vars")
 log_fit <- sgp.cv(X, y_log, g, type = "logit", penalty = "sge")
 plot(log_fit)
 predict(log_fit, extract = "vars")

[Package SGPR version 0.1.2 Index]