SparseGroupPLS {sharp}R Documentation

Sparse group Partial Least Squares

Description

Runs a sparse group Partial Least Squares model using implementation from sgPLS-package. This function is not using stability.

Usage

SparseGroupPLS(
  xdata,
  ydata,
  family = "gaussian",
  group_x,
  group_y = NULL,
  Lambda,
  alpha.x,
  alpha.y = NULL,
  keepX_previous = NULL,
  keepY = NULL,
  ncomp = 1,
  scale = TRUE,
  ...
)

Arguments

xdata

matrix of predictors with observations as rows and variables as columns.

ydata

optional vector or matrix of outcome(s). If family is set to "binomial" or "multinomial", ydata can be a vector with character/numeric values or a factor.

family

type of PLS model. If family="gaussian", a sparse group PLS model as defined in sgPLS is run (for continuous outcomes). If family="binomial", a PLS-DA model as defined in sgPLSda is run (for categorical outcomes).

group_x

vector encoding the grouping structure among predictors. This argument indicates the number of variables in each group.

group_y

optional vector encoding the grouping structure among outcomes. This argument indicates the number of variables in each group.

Lambda

matrix of parameters controlling the number of selected groups at current component, as defined by ncomp.

alpha.x

vector of parameters controlling the level of sparsity within groups of predictors.

alpha.y

optional vector of parameters controlling the level of sparsity within groups of outcomes. Only used if family="gaussian".

keepX_previous

number of selected groups in previous components. Only used if ncomp > 1. The argument keepX in sgPLS is obtained by concatenating keepX_previous and Lambda.

keepY

number of selected groups of outcome variables. This argument is defined as in sgPLS. Only used if family="gaussian".

ncomp

number of components.

scale

logical indicating if the data should be scaled (i.e. transformed so that all variables have a standard deviation of one). Only used if family="gaussian".

...

additional arguments to be passed to sgPLS or sgPLSda.

Value

A list with:

selected

matrix of binary selection status. Rows correspond to different model parameters. Columns correspond to predictors.

beta_full

array of model coefficients. Rows correspond to different model parameters. Columns correspond to predictors (starting with "X") or outcomes (starting with "Y") variables for different components (denoted by "PC").

References

Liquet B, de Micheaux PL, Hejblum BP, ThiĆ©baut R (2016). “Group and sparse group partial least square approaches applied in genomics context.” Bioinformatics, 32(1), 35-42. ISSN 1367-4803, doi:10.1093/bioinformatics/btv535.

See Also

VariableSelection, BiSelection

Other penalised dimensionality reduction functions: GroupPLS(), SparsePCA(), SparsePLS()

Examples

if (requireNamespace("sgPLS", quietly = TRUE)) {
  ## Sparse group PLS
  # Data simulation
  set.seed(1)
  simul <- SimulateRegression(n = 100, pk = 30, q = 3, family = "gaussian")
  x <- simul$xdata
  y <- simul$ydata

  # Running sgPLS with 1 group and sparsity of 0.5
  mypls <- SparseGroupPLS(
    xdata = x, ydata = y, Lambda = 1, family = "gaussian",
    group_x = c(10, 15, 5), alpha.x = 0.5
  )

  # Running sgPLS with groups on outcomes
  mypls <- SparseGroupPLS(
    xdata = x, ydata = y, Lambda = 1, family = "gaussian",
    group_x = c(10, 15, 5), alpha.x = 0.5,
    group_y = c(2, 1), keepY = 1, alpha.y = 0.9
  )

  ## Sparse group PLS-DA
  # Data simulation
  set.seed(1)
  simul <- SimulateRegression(n = 100, pk = 50, family = "binomial")

  # Running sgPLS-DA with 1 group and sparsity of 0.9
  mypls <- SparseGroupPLS(
    xdata = simul$xdata, ydata = simul$ydata, Lambda = 1, family = "binomial",
    group_x = c(10, 15, 25), alpha.x = 0.9
  )
}

[Package sharp version 1.4.6 Index]