pcvpls {pcv}R Documentation

Procrustes cross-validation for PLS models

Description

Procrustes cross-validation for PLS models

Usage

pcvpls(
  X,
  Y,
  ncomp = min(nrow(X) - 1, ncol(X), 30),
  center = TRUE,
  scale = FALSE,
  cv = list("ven", 4),
  cv.scope = "global"
)

Arguments

X

matrix with predictors from the training set.

Y

vector or matrix with response values from the training set.

ncomp

number of components to use (more than the expected optimal number).

center

logical, to center or not the data sets

scale

logical, to scale or not the data sets

cv

which split method to use for cross-validation (see description for details).

cv.scope

scope for center/scale operations inside CV loop: 'global' — using globally computed mean and std or 'local' — recompute new for each local calibration set.

Details

The method computes pseudo-validation matrix Xpv, based on PLS decomposition of calibration set X, y and cross-validation.

Parameter 'cv' defines how to split the rows of the training set. The split is similar to cross-validation splits, as PCV is based on cross-validation. This parameter can have the following values:

1. A number (e.g. 'cv = 4'). In this case this number specifies number of segments for random splits, except 'cv = 1' which is a special case for leave-one-out (full cross-validation).

2. A list with 2 values: 'list("name", nseg)'. In this case '"name"' defines the way to make the split, you can select one of the following: '"loo"' for leave-one-out, '"rand"' for random splits or '"ven"' for Venetian blinds (systematic) splits. The second parameter, 'nseg', is a number of segments for splitting the rows into. For example, 'cv = list("ven", 4)', shown in the code examples above, tells PCV to use Venetian blinds splits with 4 segments.

3. A vector with integer numbers, e.g. 'cv = c(1, 2, 3, 1, 2, 3, 1, 2, 3)'. In this case number of values in this vector must be the same as number of rows in the training set. The values specify which segment a particular row will belong to. In case of the example shown here, it is assumed that you have 9 rows in the calibration set, which will be split into 3 segments. The first segment will consist of measurements from rows 1, 4 and 7.

Parameter 'cv.scope' influences how the Procrustean rule is met. In case of "global" scope, the rule will be met strictly - error of predictions for PV-set and the global model will be identical to the error from conventional cross-validation. In case of "local" scope, every local model will have its own center and hence the rule will be almost met (the errors will be close but not identical).

Value

Pseudo-validation matrix (same size as X) with an additional attribute, 'D' which contains the scaling coefficients (ck/c)

References

1. S. Kucheryavskiy, O. Rodionova, A. Pomerantsev. Procrustes cross-validation of multivariate regression models. Analytica Chimica Acta, 1255 (2022) [https://doi.org/10.1016/j.aca.2023.341096]

Examples


# load NIR spectra of Corn samples
data(corn)
X <- corn$spectra
Y <- corn$moisture

# generate Xpv set based on PCA decomposition with A = 20 and venetian blinds split with 4 segments
Xpv <- pcvpls(X, Y, ncomp = 20, center = TRUE, scale = FALSE, cv = list("ven", 4))

# show the original spectra and the PV-set (as is and mean centered)
oldpar <- par(mfrow = c(2, 2))
matplot(t(X), type = "l", lty = 1, main = "Original data")
matplot(t(Xpv), type = "l", lty = 1, main = "PV-set")
matplot(t(scale(X, scale = FALSE)), type = "l", lty = 1, main = "Original data (mean centered)")
matplot(t(scale(Xpv, scale = FALSE)), type = "l", lty = 1, main = "PV-set (mean centered)")
par(oldpar)

# show the heatmap with the scaling coefficients
plotD(Xpv)


[Package pcv version 1.1.0 Index]