lwplsr_agg {rchemo}R Documentation

Aggregation of KNN-LWPLSR models with different numbers of LVs

Description

Ensemblist method where the predictions are calculated by averaging the predictions of KNN-LWPLSR models (lwplsr) built with different numbers of latent variables (LVs).

For instance, if argument nlv is set to nlv = "5:10", the prediction for a new observation is the simple average of the predictions returned by the models with 5 LVS, 6 LVs, ... 10 LVs, respectively.

Usage


lwplsr_agg(
    X, Y,
    nlvdis, diss = c("eucl", "mahal"),
    h, k,
    nlv,
    cri = 4,
    verb = FALSE
    )

## S3 method for class 'Lwplsr_agg'
predict(object, X, ...)  

Arguments

X

For the main function: Training X-data (n, p). — For the auxiliary function: New X-data (m, p) to consider.

Y

Training Y-data (n, q).

nlvdis

The number of LVs to consider in the global PLS used for the dimension reduction before calculating the dissimilarities. If nlvdis = 0, there is no dimension reduction.

diss

The type of dissimilarity used for defining the neighbors. Possible values are "eucl" (default; Euclidean distance), "mahal" (Mahalanobis distance), or "correlation". Correlation dissimilarities are calculated by sqrt(.5 * (1 - rho)).

h

A scale scalar defining the shape of the weight function. Lower is h, sharper is the function. See wdist.

k

The number of nearest neighbors to select for each observation to predict.

nlv

A character string such as "5:20" defining the range of the numbers of LVs to consider (here: the models with nb LVS = 5, 6, ..., 20 are averaged). Syntax such as "10" is also allowed (here: correponds to the single model with 10 LVs).

cri

Argument cri in function wdist.

verb

Logical. If TRUE, fitting information are printed.

object

For the auxiliary function: A fitted model, output of a call to the main function.

...

For the auxiliary function: Optional arguments. Not used.

Value

For lwplsr_agg: object of class Lwplsr_agg

For predict.Lwplsr_agg:

pred

prediction calculated for each observation, which is the most occurent level (vote) over the predictions returned by the models with different numbers of LVS respectively

listnn

list with the neighbors used for each observation to be predicted

listd

list with the distances to the neighbors used for each observation to be predicted

listw

list with the weights attributed to the neighbors used for each observation to be predicted

Note

In the examples, gridscore and gricv have been used as there is no sense to use gridscorelv and gricvlv.

Examples


## EXAMPLE 1

n <- 30 ; p <- 10
Xtrain <- matrix(rnorm(n * p), ncol = p)
ytrain <- rnorm(n)
Ytrain <- cbind(ytrain, 100 * ytrain)
m <- 4
Xtest <- matrix(rnorm(m * p), ncol = p)
ytest <- rnorm(m)
Ytest <- cbind(ytest, 10 * ytest)

nlvdis <- 5 ; diss <- "mahal"
h <- 2 ; k <- 10
nlv <- "2:6" 
fm <- lwplsr_agg(
    Xtrain, Ytrain, 
    nlvdis = nlvdis, diss = diss,
    h = h, k = k,
    nlv = nlv)
names(fm)
res <- predict(fm, Xtest)
names(res)
res$pred
msep(res$pred, Ytest)

## EXAMPLE 2

n <- 30 ; p <- 10
Xtrain <- matrix(rnorm(n * p), ncol = p)
ytrain <- rnorm(n)
Ytrain <- cbind(ytrain, 100 * ytrain)
m <- 4
Xtest <- matrix(rnorm(m * p), ncol = p)
ytest <- rnorm(m)
Ytest <- cbind(ytest, 10 * ytest)

nlvdis <- 5 ; diss <- "mahal"
h <- c(2, Inf)
k <- c(10, 20)
nlv <- c("1:3", "2:5")
pars <- mpars(nlvdis = nlvdis, diss = diss,
    h = h, k = k, nlv = nlv)
pars
res <- gridscore(
    Xtrain, Ytrain, Xtest, Ytest, 
    score = msep, 
    fun = lwplsr_agg, 
    pars = pars)
res

## EXAMPLE 3

n <- 30 ; p <- 10
Xtrain <- matrix(rnorm(n * p), ncol = p)
ytrain <- rnorm(n)
Ytrain <- cbind(ytrain, 100 * ytrain)
m <- 4
Xtest <- matrix(rnorm(m * p), ncol = p)
ytest <- rnorm(m)
Ytest <- cbind(ytest, 10 * ytest)

K = 3
segm <- segmkf(n = n, K = K, nrep = 1)
segm
res <- gridcv(
    Xtrain, Ytrain, 
    segm, score = msep, 
    fun = lwplsr_agg, 
    pars = pars,
    verb = TRUE)
res


[Package rchemo version 0.1-1 Index]