sgp.cv {SGPR}R Documentation

Cross-validation for sparse group penalties

Description

A function that performs k-fold cross-validation for sparse group penalties for a lambda sequence.

Usage

sgp.cv(
  X,
  y,
  group = 1:ncol(X),
  Z = NULL,
  ...,
  nfolds = 10,
  seed,
  fold,
  type,
  returnY = FALSE,
  print.trace = FALSE
)

Arguments

X

The design matrix without intercept with the variables to be selected.

y

The response vector.

group

A vector indicating the group membership of each variable in X.

Z

The design matrix of the variables to be included in the model without penalization.

...

Other parameters of underlying basic functions.

nfolds

The number of folds for cross-validation.

seed

A seed provided by the user for the random number generator.

fold

A vector of folds specified by the user (default is a random assignment).

type

A string indicating the type of regression model (linear or binomial).

returnY

A Boolean value indicating whether the fitted values should be returned.

print.trace

A Boolean value that specifies whether the beginning of a fold should be printed.

Value

A list containing:

cve

The average cross-validation error for each value of lambda.

cvse

The estimated standard error for each value of cve.

lambdas

The sequence of lambda values.

fit

The sparse group penalty model fitted to the entire data.

fold

The fold assignments for each observation for the cross-validation procedure.

min

The index of lambda corresponding to the minimum cross-validation error.

lambda.min

The value of lambda with the minimum cross-validation error.

null.dev

The deviance for the empty model.

pe

The cross-validation prediction error for each value of lambda (for binomial only).

pred

The fitted values from the cross-validation folds.

Examples


# Generate data
 n <- 100
 p <- 200
 nr <- 10
 g <- ceiling(1:p / nr)
 X <- matrix(rnorm(n * p), n, p)
 b <- c(-3:3)
 y_lin <- X[, 1:length(b)] %*% b + 5 * rnorm(n)
 y_log <- rbinom(n, 1, exp(y_lin) / (1 + exp(y_lin)))

# Linear regression
 lin_fit <- sgp.cv(X, y_lin, g, type = "linear", penalty = "sgl")
 plot(lin_fit)
 predict(lin_fit, extract = "vars")
 lin_fit <- sgp.cv(X, y_lin, g, type = "linear", penalty = "sgs")
 plot(lin_fit)
 predict(lin_fit, extract = "vars")
 lin_fit <- sgp.cv(X, y_lin, g, type = "linear", penalty = "sgm")
 plot(lin_fit)
 predict(lin_fit, extract = "vars")
 lin_fit <- sgp.cv(X, y_lin, g, type = "linear", penalty = "sge")
 plot(lin_fit)
 predict(lin_fit, extract = "vars")

# Logistic regression
 log_fit <- sgp.cv(X, y_log, g, type = "logit", penalty = "sgl")
 plot(log_fit)
 predict(log_fit, extract = "vars")
 log_fit <- sgp.cv(X, y_log, g, type = "logit", penalty = "sgs")
 plot(log_fit)
 predict(log_fit, extract = "vars")
 log_fit <- sgp.cv(X, y_log, g, type = "logit", penalty = "sgm")
 plot(log_fit)
 predict(log_fit, extract = "vars")
 log_fit <- sgp.cv(X, y_log, g, type = "logit", penalty = "sge")
 plot(log_fit)
 predict(log_fit, extract = "vars")



[Package SGPR version 0.1.2 Index]