R: LS-PLS Models

lspls-package {lspls}

R Documentation

LS-PLS Models

Description

Implements the LS-PLS (least squares - partial least squares) method described in for instance Jørgensen, K., Segtnan, V. H., Thyholt, K., Næs, T. (2004) "A Comparison of Methods for Analysing Regression Models with Both Spectral and Designed Variables" Journal of Chemometrics, 18(10), 451–464, <doi:10.1002/cem.890>.

Details

The DESCRIPTION file:

Package:	lspls
Title:	LS-PLS Models
Version:	0.2-2
Date:	2018-07-26
Authors@R:	c(person("Bjørn-Helge", "Mevik", role = c("aut", "cre"), email = "b-h@mevik.net"))
Author:	Bjørn-Helge Mevik [aut, cre]
Maintainer:	Bjørn-Helge Mevik <b-h@mevik.net>
Encoding:	UTF-8
Depends:	pls (>= 2.2.0)
Imports:	grDevices, graphics, methods, stats
Description:	Implements the LS-PLS (least squares - partial least squares) method described in for instance Jørgensen, K., Segtnan, V. H., Thyholt, K., Næs, T. (2004) "A Comparison of Methods for Analysing Regression Models with Both Spectral and Designed Variables" Journal of Chemometrics, 18(10), 451--464, <doi:10.1002/cem.890>.
License:	GPL-2
URL:	http://mevik.net/work/software/lspls.html, https://github.com/bhmevik/lspls
BugReports:	https://github.com/bhmevik/lspls/issues

Index of help topics:

MSEP.lsplsCv            MSEP, RMSEP and R^2 for LS-PLS
lspls                   Fit LS-PLS Models
lspls-package           LS-PLS Models
lsplsCv                 Cross-Validate LS-PLS Models
orthlspls.fit           Underlying LS-PLS Fit Function
orthlsplsCv             Low Level Cross-Validation Function
plot.lspls              Plots of LS-PLS Models
plot.lsplsCv            Plot Method for Cross-Validations
predict.lspls           Predict Method for LS-PLS Models
project                 Projection and Orthogonalisation

LS-PLS (least squares–partial least squares) models are written on the form

Y = X\beta + T_1\gamma_1 + \cdots + T_k\gamma_k + E,

where the terms T_i are one or more matrices Z_{i,j} separated by a colon (:), i.e., Z_{i,1} \colon Z_{i,2}\colon \cdots \colon Z_{i,l_i}. Multi-response models are possible, in wich case Y should be a matrix.

The model is fitted from left to right. First Y is fitted to X using least squares (LS) regression and the residuals calculated. For each i, the matrices Z_{i,1}, ..., Z_{i,l_i} are orthogonalised against the variables used in the regression sofar (when i = 1, this means X). The residuals from the LS regression are used as the response in PLS regressions with the orthogonalised matrices as predictors (one PLS regression for each matrix), and the desired number of PLS components from each matrix are included among the LS prediction variables. The LS regression is then refit with the new variables, and new residuals calculated.

The function to fit LS-PLS models is lspls. A typical usage to fit the model

y = X\beta + Z \gamma + V_1 \colon V_2 \eta + W \theta + E

would be

  mod <- lspls(y ~ X + Z + V1:V2 + W, ncomp = list(3, c(2,1), 2),
               data = mydata)

The first argument is the formula describing the model. X is fit first, using LS. Then PLS scores from Z (orthogonalised) are added. Then PLS scores from V1 and V2 are added (simultaneously), and finally PLS scores from W. The next argument, ncomp, specifies the number of components to use from each PLS: 3 Z score vectors, 2 V1 score vectors, 1 V2 score vector and 2 W score vectors. Finally, mydata should be a data frame with matrices y, X, Z, V1, V2 and W (for single-response models, y can be a vector).

Currently, score plots and loading plots of fitted models are implemented. plot(mod, "scores") gives score plots for each PLS regression, and plot(mod, "loadings") gives loading plots.

There is a predict method to predict response or score values from new data

  predict(mod, newdata = mynewdata)

(This predicts response values. Use type = "scores" to get scores.) Also, the standard functions resid and fitted can be used to extract the residuals and fitted values.

In order to determine the number of components to use from each matrix, one can use cross-validation:

  cvmod <- lsplsCv(y ~ X + Z + V1:V2 + W, ncomp = list(4, c(3,4), 3),
                   segments = 12, data = mydata)

In lsplsCv, ncomp gives the maximal number of components to test. The argument segments specifies the number of segments to use. One can specify the type of segments to use (random (default), consequtive or interleaved) with the argument segment.type. Alternatively, one can supply the segments explicitly with segments. See lsplsCv for details.

One can plot cross-validated RMSEP values with plot(cvmod). (Similarly, plot(cvmod, "MSEP") plots MSEP values.) This makes it easier to determine the optimal number of components for each PLS. See plot.lsplsCv for details. To calculate the RMSEP or MSEP values explicitly, one can use the function RMSEP or MSEP.

Author(s)

Bjørn-Helge Mevik [aut, cre]

Maintainer: Bjørn-Helge Mevik <b-h@mevik.net>

References

Jørgensen, K., Segtnan, V. H., Thyholt, K., Næs, T. (2004) A Comparison of Methods for Analysing Regression Models with Both Spectral and Designed Variables. Journal of Chemometrics, 18(10), 451–464.

Jørgensen, K., Mevik, B.-H., Næs, T. Combining Designed Experiments with Several Blocks of Spectroscopic Data. (Submitted)

Mevik, B.-H., Jørgensen, K., Måge, I., Næs, T. LS-PLS: Combining Categorical Design Variables with Blocks of Spectroscopic Measurements. (Submitted)

Examples

## FIXME