R: Fit an SGS model.

fit_sgs {sgs}

R Documentation

Fit an SGS model.

Description

Sparse-group SLOPE (SGS) main fitting function. Supports both linear and logistic regression, both with dense and sparse matrix implementations.

Usage

fit_sgs(
  X,
  y,
  groups,
  type = "linear",
  lambda = "path",
  path_length = 20,
  min_frac = 0.05,
  alpha = 0.95,
  vFDR = 0.1,
  gFDR = 0.1,
  pen_method = 1,
  max_iter = 5000,
  backtracking = 0.7,
  max_iter_backtracking = 100,
  tol = 1e-05,
  standardise = "l2",
  intercept = TRUE,
  screen = TRUE,
  verbose = FALSE,
  w_weights = NULL,
  v_weights = NULL
)

Arguments

`X`	Input matrix of dimensions `n \times p`. Can be a sparse matrix (using class `"sparseMatrix"` from the `Matrix` package).
`y`	Output vector of dimension `n`. For `type="linear"` should be continuous and for `type="logistic"` should be a binary variable.
`groups`	A grouping structure for the input data. Should take the form of a vector of group indices.
`type`	The type of regression to perform. Supported values are: `"linear"` and `"logistic"`.
`lambda`	The regularisation parameter. Defines the level of sparsity in the model. A higher value leads to sparser models: `"path"` computes a path of regularisation parameters of length `"path_length"`. The path will begin just above the value at which the first predictor enters the model and will terminate at the value determined by `"min_frac"`. User-specified single value or sequence. Internal scaling is applied based on the type of standardisation. The returned `"lambda"` value will be the original unscaled value(s).
`path_length`	The number of `\lambda` values to fit the model for. If `"lambda"` is user-specified, this is ignored.
`min_frac`	Smallest value of `\lambda` as a fraction of the maximum value. That is, the final `\lambda` will be `"min_frac"` of the first `\lambda` value.
`alpha`	The value of `\alpha`, which defines the convex balance between SLOPE and gSLOPE. Must be between 0 and 1. Recommended value is 0.95.
`vFDR`	Defines the desired variable false discovery rate (FDR) level, which determines the shape of the variable penalties. Must be between 0 and 1.
`gFDR`	Defines the desired group false discovery rate (FDR) level, which determines the shape of the group penalties. Must be between 0 and 1.
`pen_method`	The type of penalty sequences to use (see Feser and Evangelou (2023)): `"1"` uses the vMean SGS and gMean gSLOPE sequences. `"2"` uses the vMax SGS and gMean gSLOPE sequences. `"3"` uses the BH SLOPE and gMean gSLOPE sequences, also known as SGS Original.
`max_iter`	Maximum number of ATOS iterations to perform.
`backtracking`	The backtracking parameter, `\tau`, as defined in Pedregosa et. al. (2018).
`max_iter_backtracking`	Maximum number of backtracking line search iterations to perform per global iteration.
`tol`	Convergence tolerance for the stopping criteria.
`standardise`	Type of standardisation to perform on `X`: `"l2"` standardises the input data to have `\ell_2` norms of one. When using this `"lambda"` is scaled internally by `1/\sqrt{n}`. `"l1"` standardises the input data to have `\ell_1` norms of one. When using this `"lambda"` is scaled internally by `1/n`. `"sd"` standardises the input data to have standard deviation of one. `"none"` no standardisation applied.
`intercept`	Logical flag for whether to fit an intercept.
`screen`	Logical flag for whether to apply screening rules (see Feser and Evangelou (2024)). Screening discards irrelevant groups before fitting, greatly improving speed.
`verbose`	Logical flag for whether to print fitting information.
`w_weights`	Optional vector for the group penalty weights. Overrides the penalties from `pen_method` if specified. When entering custom weights, these are multiplied internally by `\lambda` and `1-\alpha`. To void this behaviour, set `\lambda = 2` and `\alpha = 0.5`.
`v_weights`	Optional vector for the variable penalty weights. Overrides the penalties from `pen_method` if specified. When entering custom weights, these are multiplied internally by `\lambda` and `\alpha`. To void this behaviour, set `\lambda = 2` and `\alpha = 0.5`.

Details

fit_sgs() fits an SGS model using adaptive three operator splitting (ATOS). SGS is a sparse-group method, so that it selects both variables and groups. Unlike group selection approaches, not every variable within a group is set as active. It solves the convex optimisation problem given by

\frac{1}{2n} f(b ; y, \mathbf{X}) + \lambda \alpha \sum_{i=1}^{p}v_i |b|_{(i)} + \lambda (1-\alpha)\sum_{g=1}^{m}w_g \sqrt{p_g} \|b^{(g)}\|_2,

where the penalty sequences are sorted and f(\cdot) is the loss function. In the case of the linear model, the loss function is given by the mean-squared error loss:

f(b; y, \mathbf{X}) = \left\|y-\mathbf{X}b \right\|_2^2.

In the logistic model, the loss function is given by

f(b;y,\mathbf{X})=-1/n \log(\mathcal{L}(b; y, \mathbf{X})).

where the log-likelihood is given by

\mathcal{L}(b; y, \mathbf{X}) = \sum_{i=1}^{n}\left\{y_i b^\intercal x_i - \log(1+\exp(b^\intercal x_i)) \right\}.

SGS can be seen to be a convex combination of SLOPE and gSLOPE, balanced through alpha, such that it reduces to SLOPE for alpha = 0 and to gSLOPE for alpha = 1. The penalty parameters in SGS are sorted so that the largest coefficients are matched with the largest penalties, to reduce the FDR.

Value

A list containing:

`beta`	The fitted values from the regression. Taken to be the more stable fit between `x` and `z`, which is usually the former. A filter is applied to remove very small values, where ATOS has not been able to shrink exactly to zero. Check this against `x` and `z`.
`x`	The solution to the original problem (see Pedregosa et. al. (2018)).
`u`	The solution to the dual problem (see Pedregosa et. al. (2018)).
`z`	The updated values from applying the first proximal operator (see Pedregosa et. al. (2018)).
`type`	Indicates which type of regression was performed.
`pen_slope`	Vector of the variable penalty sequence.
`pen_gslope`	Vector of the group penalty sequence.
`lambda`	Value(s) of `\lambda` used to fit the model.
`success`	Logical flag indicating whether ATOS converged, according to `tol`.
`num_it`	Number of iterations performed. If convergence is not reached, this will be `max_iter`.
`certificate`	Final value of convergence criteria.
`intercept`	Logical flag indicating whether an intercept was fit.

References

Feser, F., Evangelou, M. (2023). Sparse-group SLOPE: adaptive bi-level selection with FDR-control, https://arxiv.org/abs/2305.09467

Feser, F., Evangelou, M. (2024). Strong screening rules for group-based SLOPE models, https://arxiv.org/abs/2405.15357

Pedregosa, F., Gidel, G. (2018). Adaptive Three Operator Splitting, https://proceedings.mlr.press/v80/pedregosa18a.html

Examples

# specify a grouping structure
groups = c(1,1,1,2,2,3,3,3,4,4)
# generate data
data =  gen_toy_data(p=10, n=5, groups = groups, seed_id=3,group_sparsity=1)
# run SGS 
model = fit_sgs(X = data$X, y = data$y, groups = groups, type="linear", path_length = 5, alpha=0.95, 
vFDR=0.1, gFDR=0.1, standardise = "l2", intercept = TRUE, verbose=FALSE)

[Package sgs version 0.2.0 Index]