pcvpcr {pcv} | R Documentation |
Procrustes cross-validation for PCR models
Description
Procrustes cross-validation for PCR models
Usage
pcvpcr(
X,
Y,
ncomp = min(nrow(X) - 1, ncol(X), 30),
cv = list("ven", 4),
center = TRUE,
scale = FALSE,
cv.scope = "global"
)
Arguments
X |
matrix with predictors from the training set. |
Y |
vector with response values from the training set. |
ncomp |
number of components to use (more than the expected optimal number). |
cv |
which split method to use for cross-validation (see description of method 'pcvpls()' for details). |
center |
logical, to center or not the data sets |
scale |
logical, to scale or not the data sets |
cv.scope |
scope for center/scale operations inside CV loop: 'global' — using globally computed mean and std or 'local' — recompute new for each local calibration set. |
Details
The method computes pseudo-validation matrix Xpv, based on PCR decomposition of calibration set X, y and cross-validation.
Parameter 'cv' defines how to split the rows of the training set. The split is similar to cross-validation splits, as PCV is based on cross-validation. This parameter can have the following values:
1. A number (e.g. 'cv = 4'). In this case this number specifies number of segments for random splits, except 'cv = 1' which is a special case for leave-one-out (full cross-validation).
2. A list with 2 values: 'list("name", nseg)'. In this case '"name"' defines the way to make the split, you can select one of the following: '"loo"' for leave-one-out, '"rand"' for random splits or '"ven"' for Venetian blinds (systematic) splits. The second parameter, 'nseg', is a number of segments for splitting the rows into. For example, 'cv = list("ven", 4)', shown in the code examples above, tells PCV to use Venetian blinds splits with 4 segments.
3. A vector with integer numbers, e.g. 'cv = c(1, 2, 3, 1, 2, 3, 1, 2, 3)'. In this case number of values in this vector must be the same as number of rows in the training set. The values specify which segment a particular row will belong to. In case of the example shown here, it is assumed that you have 9 rows in the calibration set, which will be split into 3 segments. The first segment will consist of measurements from rows 1, 4 and 7.
Parameter 'cv.scope' influences how the Procrustean rule is met. In case of "global" scope, the rule will be met strictly - error of predictions for PV-set and the global model will be identical to the error from conventional cross-validation. In case of "local" scope, every local model will have its own center and hence the rule will be almost met (the errors will be close but not identical).
Value
Pseudo-validation matrix (same size as X) with an additional attribute, 'D' which contains the scaling coefficients (ck/c)
References
1. S. Kucheryavskiy, O. Rodionova, A. Pomerantsev. Procrustes cross-validation of multivariate regression models. Analytica Chimica Acta, 1255 (2022) [https://doi.org/10.1016/j.aca.2023.341096]
Examples
# load NIR spectra of Corn samples
data(corn)
X <- corn$spectra
Y <- corn$moisture
# generate Xpv set based on PCA decomposition with A = 20 and venetian blinds split with 4 segments
Xpv <- pcvpcr(X, Y, ncomp = 20, center = TRUE, scale = FALSE, cv = list("ven", 4))
# show the original spectra and the PV-set (as is and mean centered)
oldpar <- par(mfrow = c(2, 2))
matplot(t(X), type = "l", lty = 1, main = "Original data")
matplot(t(Xpv), type = "l", lty = 1, main = "PV-set")
matplot(t(scale(X, scale = FALSE)), type = "l", lty = 1, main = "Original data (mean centered)")
matplot(t(scale(Xpv, scale = FALSE)), type = "l", lty = 1, main = "PV-set (mean centered)")
par(oldpar)