PLS {sharp}R Documentation

Partial Least Squares 'a la carte'

Description

Runs a Partial Least Squares (PLS) model in regression mode using algorithm implemented in pls. This function allows for the construction of components based on different sets of predictor and/or outcome variables. This function is not using stability.

Usage

PLS(
  xdata,
  ydata,
  selectedX = NULL,
  selectedY = NULL,
  family = "gaussian",
  ncomp = NULL,
  scale = TRUE
)

Arguments

xdata

matrix of predictors with observations as rows and variables as columns.

ydata

optional vector or matrix of outcome(s). If family is set to "binomial" or "multinomial", ydata can be a vector with character/numeric values or a factor.

selectedX

binary matrix of size (ncol(xdata) * ncomp). The binary entries indicate which predictors (in rows) contribute to the definition of each component (in columns). If selectedX=NULL, all predictors are selected for all components.

selectedY

binary matrix of size (ncol(ydata) * ncomp). The binary entries indicate which outcomes (in rows) contribute to the definition of each component (in columns). If selectedY=NULL, all outcomes are selected for all components.

family

type of PLS model. Only family="gaussian" is supported. This corresponds to a PLS model as defined in pls (for continuous outcomes).

ncomp

number of components.

scale

logical indicating if the data should be scaled (i.e. transformed so that all variables have a standard deviation of one).

Details

All matrices are defined as in (Wold et al. 2001). The weight matrix Wmat is the equivalent of loadings$X in pls. The loadings matrix Pmat is the equivalent of mat.c in pls. The score matrices Tmat and Qmat are the equivalent of variates$X and variates$Y in pls.

Value

A list with:

Wmat

matrix of X-weights.

Wstar

matrix of transformed X-weights.

Pmat

matrix of X-loadings.

Cmat

matrix of Y-weights.

Tmat

matrix of X-scores.

Umat

matrix of Y-scores.

Qmat

matrix needed for predictions.

Rmat

matrix needed for predictions.

meansX

vector used for centering of predictors, needed for predictions.

sigmaX

vector used for scaling of predictors, needed for predictions.

meansY

vector used for centering of outcomes, needed for predictions.

sigmaY

vector used for scaling of outcomes, needed for predictions.

methods

a list with family and scale values used for the run.

params

a list with selectedX and selectedY values used for the run.

References

Wold S, Sjöström M, Eriksson L (2001). “PLS-regression: a basic tool of chemometrics.” Chemometrics and Intelligent Laboratory Systems, 58(2), 109-130. ISSN 0169-7439, doi:10.1016/S0169-7439(01)00155-1, PLS Methods.

See Also

VariableSelection, BiSelection

Examples


if (requireNamespace("mixOmics", quietly = TRUE)) {
  oldpar <- par(no.readonly = TRUE)

  # Data simulation
  set.seed(1)
  simul <- SimulateRegression(n = 200, pk = 15, q = 3, family = "gaussian")
  x <- simul$xdata
  y <- simul$ydata

  # PLS
  mypls <- PLS(xdata = x, ydata = y, ncomp = 3)

  if (requireNamespace("sgPLS", quietly = TRUE)) {
    # Sparse PLS to identify relevant variables
    stab <- BiSelection(
      xdata = x, ydata = y,
      family = "gaussian", ncomp = 3,
      LambdaX = seq_len(ncol(x) - 1),
      LambdaY = seq_len(ncol(y) - 1),
      implementation = SparsePLS,
      n_cat = 2
    )
    plot(stab)

    # Refitting of PLS model
    mypls <- PLS(
      xdata = x, ydata = y,
      selectedX = stab$selectedX,
      selectedY = stab$selectedY
    )

    # Nonzero entries in weights are the same as in selectedX
    par(mfrow = c(2, 2))
    Heatmap(stab$selectedX,
      legend = FALSE
    )
    title("Selected in X")
    Heatmap(ifelse(mypls$Wmat != 0, yes = 1, no = 0),
      legend = FALSE
    )
    title("Nonzero entries in Wmat")
    Heatmap(stab$selectedY,
      legend = FALSE
    )
    title("Selected in Y")
    Heatmap(ifelse(mypls$Cmat != 0, yes = 1, no = 0),
      legend = FALSE
    )
    title("Nonzero entries in Cmat")
  }

  # Multilevel PLS
  # Generating random design
  z <- rep(seq_len(50), each = 4)

  # Extracting the within-variability
  x_within <- mixOmics::withinVariation(X = x, design = cbind(z))

  # Running PLS on within-variability
  mypls <- PLS(xdata = x_within, ydata = y, ncomp = 3)

  par(oldpar)
}


[Package sharp version 1.4.6 Index]