R: Fit a gSLOPE model using k-fold cross-validation.

fit_gslope_cv {sgs}

R Documentation

Fit a gSLOPE model using k-fold cross-validation.

Description

Function to fit a pathwise solution of group SLOPE (gSLOPE) models using k-fold cross-validation. Supports both linear and logistic regression, both with dense and sparse matrix implementations.

Usage

fit_gslope_cv(
  X,
  y,
  groups,
  type = "linear",
  lambda = "path",
  path_length = 20,
  min_frac = 0.05,
  nfolds = 10,
  gFDR = 0.1,
  pen_method = 1,
  backtracking = 0.7,
  max_iter = 5000,
  max_iter_backtracking = 100,
  tol = 1e-05,
  standardise = "l2",
  intercept = TRUE,
  error_criteria = "mse",
  screen = TRUE,
  verbose = FALSE,
  w_weights = NULL
)

Arguments

`X`	Input matrix of dimensions `n \times p`. Can be a sparse matrix (using class `"sparseMatrix"` from the `Matrix` package).
`y`	Output vector of dimension `n`. For `type="linear"` should be continuous and for `type="logistic"` should be a binary variable.
`groups`	A grouping structure for the input data. Should take the form of a vector of group indices.
`type`	The type of regression to perform. Supported values are: `"linear"` and `"logistic"`.
`lambda`	The regularisation parameter. Defines the level of sparsity in the model. A higher value leads to sparser models: `"path"` computes a path of regularisation parameters of length `"path_length"`. The path will begin just above the value at which the first predictor enters the model and will terminate at the value determined by `"min_frac"`. User-specified single value or sequence. Internal scaling is applied based on the type of standardisation. The returned `"lambda"` value will be the original unscaled value(s).
`path_length`	The number of `\lambda` values to fit the model for. If `"lambda"` is user-specified, this is ignored.
`min_frac`	Defines the termination point of the pathwise solution, so that `\lambda_\text{min} = min_frac \cdot \lambda_\text{max}`.
`nfolds`	The number of folds to use in cross-validation.
`gFDR`	Defines the desired group false discovery rate (FDR) level, which determines the shape of the penalties. Must be between 0 and 1.
`pen_method`	The type of penalty sequences to use (see Brzyski et al. (2019)): `"1"` uses the gMean gSLOPE sequence. `"2"` uses the gMax gSLOPE sequence.
`backtracking`	The backtracking parameter, `\tau`, as defined in Pedregosa et. al. (2018).
`max_iter`	Maximum number of ATOS iterations to perform.
`max_iter_backtracking`	Maximum number of backtracking line search iterations to perform per global iteration.
`tol`	Convergence tolerance for the stopping criteria.
`standardise`	Type of standardisation to perform on `X`: `"l2"` standardises the input data to have `\ell_2` norms of one. `"l1"` standardises the input data to have `\ell_1` norms of one. `"sd"` standardises the input data to have standard deviation of one. `"none"` no standardisation applied.
`intercept`	Logical flag for whether to fit an intercept.
`error_criteria`	The criteria used to discriminate between models along the path. Supported values are: `"mse"` (mean squared error) and `"mae"` (mean absolute error).
`screen`	Logical flag for whether to apply screening rules (see Feser and Evangelou (2024)). Screening discards irrelevant groups before fitting, greatly improving speed.
`verbose`	Logical flag for whether to print fitting information.
`w_weights`	Optional vector for the group penalty weights. Overrides the penalties from pen_method if specified. When entering custom weights, these are multiplied internally by `\lambda`. To void this behaviour, set `\lambda = 1` or scale it accordingly.

Details

Fits gSLOPE models under a pathwise solution using adaptive three operator splitting (ATOS), picking the 1se model as optimum. Warm starts are implemented.

Value

A list containing:

`errors`	A table containing fitting information about the models on the path.
`all_models`	Fitting information for all models fit on the path, which is a `"gslope"` object type.
`fit`	The 1se chosen model, which is a `"gslope"` object type.
`best_lambda`	The value of `\lambda` which generated the chosen model.
`best_lambda_id`	The path index for the chosen model.

References

Brzyski, D., Gossmann, A., Su, W., Bodgan, M. (2019). Group SLOPE – Adaptive Selection of Groups of Predictors, https://www.tandfonline.com/doi/full/10.1080/01621459.2017.1411269

Feser, F., Evangelou, M. (2024). Strong screening rules for group-based SLOPE models, https://proceedings.mlr.press/v80/pedregosa18a.html

Examples

# specify a grouping structure
groups = c(1,1,1,2,2,3,3,3,4,4)
# generate data
data =  gen_toy_data(p=10, n=5, groups = groups, seed_id=3,group_sparsity=1)
# run gSLOPE with cross-validation (the proximal functions can be found in utils.R)
cv_model = fit_gslope_cv(X = data$X, y = data$y, groups=groups, type = "linear", path_length = 5, 
nfolds=5, gFDR = 0.1, min_frac = 0.05, standardise="l2",intercept=TRUE,verbose=TRUE)

[Package sgs version 0.2.0 Index]