plsr_agg {rchemo}R Documentation

PLSR with aggregation of latent variables

Description

Ensemblist approach where the predictions are calculated by averaging the predictions of PLSR models (plskern) built with different numbers of latent variables (LVs).

For instance, if argument nlv is set to nlv = "5:10", the prediction for a new observation is the average (without weighting) of the predictions returned by the models with 5 LVS, 6 LVs, ... 10 LVs.

Usage


plsr_agg(X, Y, weights = NULL, nlv)

## S3 method for class 'Plsr_agg'
predict(object, X, ...)  

Arguments

For plsr_agg:

X

For the main function: Training X-data (n, p). — For the auxiliary function: New X-data (m, p) to consider.

Y

Training Y-data (n, q).

weights

Weights (n, 1) to apply to the training observations. Internally, weights are "normalized" to sum to 1. Default to NULL (weights are set to 1 / n).

nlv

A character string such as "5:20" defining the range of the numbers of LVs to consider (here: the models with nb LVS = 5, 6, ..., 20 are averaged). Syntax such as "10" is also allowed (here: correponds to the single model with 10 LVs).

object

For the auxiliary function: A fitted model, output of a call to the main functions.

...

For the auxiliary function: Optional arguments. Not used.

Value

For plsr_agg:

fm

list contaning the model: (fm)=(T): X-scores matrix; (P): X-loading matrix;(R): The PLS projection matrix (p,nlv); (W): X-loading weights matrix ;(C): The Y-loading weights matrix; (TT): the X-score normalization factor; (xmeans): the centering vector of X (p,1); (ymeans): the centering vector of Y (q,1); (weights): vector of observation weights; (U): intermediate output.

nlv

range of the numbers of LVs considered

For predict.Plsr_agg:

pred

Final predictions (after aggregation)

predlv

Intermediate predictions (Per nb. LVs)

Note

In the example, zfm is the maximal PLSR model, and there is no sense to use gridscorelv or gridcvlv instead of gridscore or gridcv.

Examples


n <- 20 ; p <- 4
Xtrain <- matrix(rnorm(n * p), ncol = p)
ytrain <- rnorm(n)
Ytrain <- cbind(y1 = ytrain, y2 = 100 * ytrain)
m <- 3
Xtest <- Xtrain[1:m, , drop = FALSE] 
Ytest <- Ytrain[1:m, , drop = FALSE] ; ytest <- Ytest[1:m, 1]

nlv <- "1:3"

fm <- plsr_agg(Xtrain, ytrain, nlv = nlv)
names(fm)

zfm <- fm$fm
class(zfm)
names(zfm)
summary(zfm, Xtrain)


res <- predict(fm, Xtest)
names(res)

res$pred
msep(res$pred, ytest)

res$predlv

pars <- mpars(nlv = c("1:3", "2:5"))
pars
res <- gridscore(
    Xtrain, Ytrain, Xtest, Ytest, 
    score = msep, 
    fun = plsr_agg, 
    pars = pars)
res

K = 3
segm <- segmkf(n = n, K = K, nrep = 1)
segm
res <- gridcv(
    Xtrain, Ytrain, 
    segm, score = msep, 
    fun = plsr_agg, 
    pars = pars,
    verb = TRUE)
res


[Package rchemo version 0.1-2 Index]