R: Partial Least Squares Regression

plsFit {mvdalab}

R Documentation

Partial Least Squares Regression

Description

Functions to perform partial least squares regression with a formula interface. Bootstraping can be used. Prediction, residuals, model extraction, plot, print and summary methods are also implemented.

Usage

plsFit(formula, data, subset, ncomp = NULL, na.action, 
method = c("bidiagpls", "wrtpls"), scale = TRUE, n_cores = 2, 
alpha = 0.05, perms = 2000, validation = c("none", "oob", "loo"), 
boots = 1000, model = TRUE, parallel = FALSE,
x = FALSE, y = FALSE, ...) 
## S3 method for class 'mvdareg'
summary(object, ncomp = object$ncomp, digits = 3, ...)

Arguments

`formula`	a model formula (see below).
`data`	an optional data frame containing the variables in the model.
`subset`	an optional vector specifying a subset of observations to be used in the fitting process.
`ncomp`	the number of components to include in the model (see below).
`na.action`	a function which indicates what should happen when the data contain `NAs`. The default is set by the `na.action` setting of options, and is `na.fail` if that is unset. The default is `na.omit`. Another possible value is `NULL`, no action. Value `na.exclude` can be useful.
`method`	the multivariate regression algorithm to be used.
`scale`	should scaling to unit variance be used.
`n_cores`	Number of cores to run for parallel processing. Currently set to 2 with the max being 4.
`alpha`	the significance level for `wrtpls`
`perms`	the number of permutations to run for `wrtpls`
`validation`	character. What kind of (internal) validation to use. See below.
`boots`	Number of bootstrap samples when `validation = 'oob'`
`model`	an optional data frame containing the variables in the model.
`parallel`	should parallelization be used.
`x`	a logical. If TRUE, the model matrix is returned.
`y`	a logical. If TRUE, the response is returned.
`object`	an object of class `"mvdareg"`, i.e., a fitted model.
`digits`	the number of decimal place to output with `summary.mvdareg`
`...`	additional arguments, passed to the underlying fit functions, and `mvdareg`. Currently not in use.

Details

The function fits a partial least squares (PLS) model with 1, ..., ncomp number of latent variables. Multi-response models are not supported.

The type of model to fit is specified with the method argument. Currently two PLS algorithms are available: the bigiag2 algorithm ("bigiagpls" and "wrtpls").

The formula argument should be a symbolic formula of the form response ~ terms, where response is the name of the response vector and terms is the name of one or more predictor matrices, usually separated by +, e.g., y ~ X + Z. See lm for a detailed description. The named variables should exist in the supplied data data frame or in the global environment. The chapter Statistical models in R of the manual An Introduction to R distributed with R is a good reference on formulas in R.

The number of components to fit is specified with the argument ncomp. It this is not supplied, the maximal number of components is used.

Note that if the number of samples is <= 15, oob validation may fail. It is recommended that you PLS with validation = "loo".

If method = "bidiagpls" and validation = "oob", bootstrap cross-validation is performed. Bootstrap confidence intervals are provided for coefficients, weights, loadings, and y.loadings. The number of bootstrap samples is specified with the argument boots. See mvdaboot for details.

If method = "bidiagpls" and validation = "loo", leave-one-out cross-validation is performed.

If method = "bidiagpls" and validation = "none", no cross-validation is performed. Note that the number of components, ncomp, is set to min(nobj - 1, npred)

If method = "wrtpls" and validation = "none", The Weight Randomization Test for the selection of the number of components is performed. Note that the number of components, ncomp, is set to min(nobj - 1, npred)

Value

An object of class mvdareg is returned. The object contains all components returned by the underlying fit function. In addition, it contains the following:

`loadings`	X loadings
`weights`	weights
`D2.values`	bidiag2 matrix
`iD2`	inverse of bidiag2 matrix
`Ymean`	mean of reponse variable
`Xmeans`	mean of predictor variables
`coefficients`	PLS regression coefficients
`y.loadings`	y-loadings
`scores`	X scores
`R`	orthogonal weights
`Y.values`	scaled response values
`Yactual`	actual response values
`fitted`	fitted values
`residuals`	residuals
`Xdata`	X matrix
`iPreds`	predicted values
`y.loadings2`	scaled y-loadings
`ncomp`	number of latent variables
`method`	PLS algorithm used
`scale`	scaling used
`validation`	validation method
`call`	model call
`terms`	model terms
`model`	fitted model

Author(s)

Nelson Lee Afanador (nelson.afanador@mvdalab.com), Thanh Tran (thanh.tran@mvdalab.com)

References

NOTE: This function is adapted from mvr in package pls with extensive modifications by Nelson Lee Afanador and Thanh Tran.

Examples

###  PLS MODEL FIT WITH method = 'bidiagpls' and validation = 'oob', i.e. bootstrapping ###
data(Penta)
## Number of bootstraps set to 300 to demonstrate flexibility
## Use a minimum of 1000 (default) for results that support bootstraping
mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], method = "bidiagpls",
               ncomp = 2, validation = "oob", boots = 300)
summary(mod1) #Model summary

###  PLS MODEL FIT WITH method = 'bidiagpls' and validation = 'loo', i.e. leave-one-out CV ###
## Not run: 
mod2 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], method = "bidiagpls",
               ncomp = 2, validation = "loo")
summary(mod2) #Model summary

## End(Not run)

###  PLS MODEL FIT WITH method = 'bidiagpls' and validation = 'none', i.e. no CV is performed ###
## Not run: 
mod3 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], method = "bidiagpls",
               ncomp = 2, validation = "none")
summary(mod3) #Model summary

## End(Not run)
###  PLS MODEL FIT WITH method = 'wrtpls' and validation = 'none', i.e. WRT-PLS is performed ###
## Not run: 
mod4 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1],
               method = "wrtpls", validation = "none")
summary(mod4) #Model summary
plot.wrtpls(mod4)

## End(Not run)

[Package mvdalab version 1.7 Index]