grpsel {grpsel}R Documentation

Group subset selection

Description

Fits the regularisation surface for a regression model with a group subset selection penalty. The group subset penalty can be combined with either a group lasso or ridge penalty for shrinkage. The group subset parameter is lambda and the group lasso/ridge parameter is gamma.

Usage

grpsel(
  x,
  y,
  group = seq_len(ncol(x)),
  penalty = c("grSubset", "grSubset+grLasso", "grSubset+Ridge"),
  loss = c("square", "logistic"),
  local.search = FALSE,
  orthogonalise = FALSE,
  nlambda = 100,
  lambda.step = 0.99,
  lambda = NULL,
  lambda.factor = NULL,
  ngamma = 10,
  gamma.max = 100,
  gamma.min = 1e-04,
  gamma = NULL,
  gamma.factor = NULL,
  pmax = ncol(x),
  gmax = length(unique(group)),
  eps = 1e-04,
  max.cd.iter = 10000,
  max.ls.iter = 100,
  active.set = TRUE,
  active.set.count = 3,
  sort = TRUE,
  screen = 500,
  warn = TRUE
)

Arguments

x

a predictor matrix

y

a response vector

group

a vector of length ncol(x) with the jth element identifying the group that the jth predictor belongs to; alternatively, a list of vectors with the kth vector identifying the predictors that belong to the kth group (useful for overlapping groups)

penalty

the type of penalty to apply; one of 'grSubset', 'grSubset+grLasso', or 'grSubset+Ridge'

loss

the type of loss function to use; 'square' for linear regression or 'logistic' for logistic regression

local.search

a logical indicating whether to perform local search after coordinate descent; typically leads to higher quality solutions

orthogonalise

a logical indicating whether to orthogonalise within groups

nlambda

the number of group subset selection parameters to evaluate when lambda is computed automatically; may evaluate fewer parameters if pmax or gmax is reached first

lambda.step

the step size taken when computing lambda from the data; should be a value strictly between 0 and 1; larger values typically lead to a finer grid of subset sizes

lambda

an optional list of decreasing sequences of group subset selection parameters; the list should contain a vector for each value of gamma

lambda.factor

a vector of penalty factors applied to the group subset selection penalty; equal to the group sizes by default

ngamma

the number of group lasso or ridge parameters to evaluate when gamma is computed automatically

gamma.max

the maximum value for gamma when penalty='grSubset+Ridge'; when penalty='grSubset+grLasso' gamma.max is computed automatically from the data

gamma.min

the minimum value for gamma when penalty='grSubset+Ridge' and the minimum value for gamma as a fraction of gamma.max when penalty='grSubset+grLasso'

gamma

an optional decreasing sequence of group lasso or ridge parameters

gamma.factor

a vector of penalty factors applied to the shrinkage penalty; by default, equal to the square root of the group sizes when penalty='grSubset+grLasso' or a vector of ones when penalty='grSubset+Ridge'

pmax

the maximum number of predictors ever allowed to be active; ignored if lambda is supplied

gmax

the maximum number of groups ever allowed to be active; ignored if lambda is supplied

eps

the convergence tolerance; convergence is declared when the relative maximum difference in consecutive coefficients is less than eps

max.cd.iter

the maximum number of coordinate descent iterations allowed per value of lambda and gamma

max.ls.iter

the maximum number of local search iterations allowed per value of lambda and gamma

active.set

a logical indicating whether to use active set updates; typically lowers the run time

active.set.count

the number of consecutive coordinate descent iterations in which a subset should appear before running active set updates

sort

a logical indicating whether to sort the coordinates before running coordinate descent; required for gradient screening; typically leads to higher quality solutions

screen

the number of groups to keep after gradient screening; smaller values typically lower the run time

warn

a logical indicating whether to print a warning if the algorithms fail to converge

Details

For linear regression (loss='square') the response and predictors are centred about zero and scaled to unit l2-norm. For logistic regression (loss='logistic') only the predictors are centred and scaled and an intercept is fit during the course of the algorithm.

Value

An object of class grpsel; a list with the following components:

beta

a list of matrices whose columns contain fitted coefficients for a given value of lambda; an individual matrix in the list for each value of gamma

gamma

a vector containing the values of gamma used in the fit

lambda

a list of vectors containing the values of lambda used in the fit; an individual vector in the list for each value of gamma

np

a list of vectors containing the number of active predictors per value of lambda; an individual vector in the list for each value of gamma

ng

a list of vectors containing the the number of active groups per value of lambda; an individual vector in the list for each value of gamma

iter.cd

a list of vectors containing the number of coordinate descent iterations per value of lambda; an individual vector in the list for each value of gamma

iter.ls

a list of vectors containing the number of local search iterations per value of lambda; an individual vector in the list for each value of gamma

loss

a list of vectors containing the evaluated loss function per value of lambda evaluated; an individual vector in the list for each value of gamma

Author(s)

Ryan Thompson <ryan.thompson@monash.edu>

References

Thompson, R. and Vahid, F. (2021). 'Group selection and shrinkage with application to sparse semiparametric modeling'. arXiv: 2105.12081.

Examples

# Grouped data
set.seed(123)
n <- 100
p <- 10
g <- 5
group <- rep(1:g, each = p / g)
beta <- numeric(p)
beta[which(group %in% 1:2)] <- 1
x <- matrix(rnorm(n * p), n, p)
y <- rnorm(n, x %*% beta)
newx <- matrix(rnorm(p), ncol = p)

# Group subset selection
fit <- grpsel(x, y, group)
plot(fit)
coef(fit, lambda = 0.05)
predict(fit, newx, lambda = 0.05)

# Group subset selection with group lasso shrinkage
fit <- grpsel(x, y, group, penalty = 'grSubset+grLasso')
plot(fit, gamma = 0.05)
coef(fit, lambda = 0.05, gamma = 0.1)
predict(fit, newx, lambda = 0.05, gamma = 0.1)

# Group subset selection with ridge shrinkage
fit <- grpsel(x, y, group, penalty = 'grSubset+Ridge')
plot(fit, gamma = 0.05)
coef(fit, lambda = 0.05, gamma = 0.1)
predict(fit, newx, lambda = 0.05, gamma = 0.1)

[Package grpsel version 1.3.1 Index]