selwold {rchemo}R Documentation

Heuristic selection of the dimension of a latent variable model with the Wold's criterion

Description

The function helps selecting the dimensionnality of latent variable (LV) models (e.g. PLSR) using the "Wold criterion".

The criterion is the "precision gain ratio" R=1r(a+1)/r(a)R = 1 - r(a+1) / r(a) where rr is an observed error rate quantifying the model performance (msep, classification error rate, etc.) and aa the model dimensionnality (= nb. LVs). It can also represent other indicators such as the eigenvalues of a PCA.

RR is the relative gain in efficiency after a new LV is added to the model. The iterations continue until RR becomes lower than a threshold value alphaalpha. By default and only as an indication, the default alpha=.05alpha = .05 is set in the function, but the user should set any other value depending on his data and parcimony objective.

In the original article, Wold (1978; see also Bro et al. 2008) used the ratio of cross-validated over training residual sums of squares, i.e. PRESS over SSR. Instead, selwold compares values of consistent nature (the successive values in the input vector rr), e.g. PRESS only . For instance, rr was set to PRESS values in Li et al. (2002) and Andries et al. (2011), which is equivalent to the "punish factor" described in Westad & Martens (2000).

The ratio RR is often erratic, making difficult the dimensionnaly selection. Function selwold proposes to calculate a smoothing of RR (argument smoothsmooth).

Usage

selwold(
    r, indx = seq(length(r)), 
    smooth = TRUE, f = 1/3,
    alpha = .05, digits = 3,
    plot = TRUE,
    xlab = "Index", ylab = "Value", main = "r",
    ...
    )
  

Arguments

r

Vector of a given error rate (nn) or any other indicator.

indx

Vector of indexes (nn), typically the nb. of Lvs.

smooth

Logical. If TRUE (default), the selection is done on the smoothed RR.

f

Window for smoothing RR with function lowess.

alpha

Proportion alphaalpha used as threshold for RR.

digits

Number of digits for RR.

plot

Logical. If TRUE (default), results are plotted.

xlab

x-axis label of the plot of rr (left-side in the graphic window).

ylab

y-axis label of the plot of rr (left-side in the graphic window).

main

Title of the plot of rr (left-side in the graphic window).

...

Other arguments to pass in function lowess.

Value

res

matrix with for each number of Lvs: rr, the observed error rate quantifying the model performance; diffdiff, the difference between r(a+1)r(a+1) and r(a)r(a) ; RR, the relative gain in efficiency after a new LV is added to the model; RsRs, smoothing of RR.

opt

The index of the minimum for rr.

sel

The index of the selection from the RR (or smoothed RR) threshold.

References

Andries, J.P.M., Vander Heyden, Y., Buydens, L.M.C., 2011. Improved variable reduction in partial least squares modelling based on Predictive-Property-Ranked Variables and adaptation of partial least squares complexity. Analytica Chimica Acta 705, 292-305. https://doi.org/10.1016/j.aca.2011.06.037

Bro, R., Kjeldahl, K., Smilde, A.K., Kiers, H.A.L., 2008. Cross-validation of component models: A critical look at current methods. Anal Bioanal Chem 390, 1241-1251. https://doi.org/10.1007/s00216-007-1790-1

Li, B., Morris, J., Martin, E.B., 2002. Model selection for partial least squares regression. Chemometrics and Intelligent Laboratory Systems 64, 79-89. https://doi.org/10.1016/S0169-7439(02)00051-5

Westad, F., Martens, H., 2000. Variable Selection in near Infrared Spectroscopy Based on Significance Testing in Partial Least Squares Regression. J. Near Infrared Spectrosc., JNIRS 8, 117-124.

Wold S. Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models. Technometrics. 1978;20(4):397-405

Examples


data(cassav)

Xtrain <- cassav$Xtrain
ytrain <- cassav$ytrain
X <- cassav$Xtest
y <- cassav$ytest

nlv <- 20
res <- gridscorelv(
    Xtrain, ytrain, X, y, 
    score = msep, fun = plskern, 
    nlv = 0:nlv
    )
selwold(res$y1, res$nlv, f = 2/3)


[Package rchemo version 0.1-2 Index]