cv.grpsel {grpsel} | R Documentation |
Cross-validated group subset selection
Description
Fits the regularisation surface for a regression model with a group subset selection penalty and then cross-validates this surface.
Usage
cv.grpsel(
x,
y,
group = seq_len(ncol(x)),
penalty = c("grSubset", "grSubset+grLasso", "grSubset+Ridge"),
loss = c("square", "logistic"),
lambda = NULL,
gamma = NULL,
nfold = 10,
folds = NULL,
cv.loss = NULL,
cluster = NULL,
interpolate = TRUE,
...
)
Arguments
x |
a predictor matrix |
y |
a response vector |
group |
a vector of length |
penalty |
the type of penalty to apply; one of 'grSubset', 'grSubset+grLasso', or 'grSubset+Ridge' |
loss |
the type of loss function to use; 'square' for linear regression or 'logistic' for logistic regression |
lambda |
an optional list of decreasing sequences of group subset selection parameters; the
list should contain a vector for each value of |
gamma |
an optional decreasing sequence of group lasso or ridge parameters |
nfold |
the number of cross-validation folds |
folds |
an optional vector of length |
cv.loss |
an optional cross-validation loss-function to use; should accept a vector of predicted values and a vector of actual values |
cluster |
an optional cluster for running cross-validation in parallel; must be set up using
|
interpolate |
a logical indicating whether to interpolate the |
... |
any other arguments for |
Details
When loss='logistic'
stratified cross-validation is used to balance
the folds. When fitting to the cross-validation folds, interpolate=TRUE
cross-validates
the midpoints between consecutive lambda
values rather than the original lambda
sequence. This new sequence retains the same set of solutions on the full data, but often leads
to superior cross-validation performance.
Value
An object of class cv.grpsel
; a list with the following components:
cv.mean |
a list of vectors containing cross-validation means per value of |
cd.sd |
a list of vectors containing cross-validation standard errors per value of
|
lambda |
a list of vectors containing the values of |
gamma |
a vector containing the values of |
lambda.min |
the value of |
gamma.min |
the value of |
fit |
the fit from running |
Author(s)
Ryan Thompson <ryan.thompson@monash.edu>
Examples
# Grouped data
set.seed(123)
n <- 100
p <- 10
g <- 5
group <- rep(1:g, each = p / g)
beta <- numeric(p)
beta[which(group %in% 1:2)] <- 1
x <- matrix(rnorm(n * p), n, p)
y <- rnorm(n, x %*% beta)
newx <- matrix(rnorm(p), ncol = p)
# Group subset selection
fit <- cv.grpsel(x, y, group)
plot(fit)
coef(fit)
predict(fit, newx)
# Parallel cross-validation
cl <- parallel::makeCluster(2)
fit <- cv.grpsel(x, y, group, cluster = cl)
parallel::stopCluster(cl)