R: Aggregation of KNN-LWPLSDA models with different numbers of...

lwplsrda_agg {rchemo}

R Documentation

Aggregation of KNN-LWPLSDA models with different numbers of LVs

Description

Ensemblist method where the predictions are calculated by "averaging" the predictions of KNN-LWPLSDA models built with different numbers of latent variables (LVs).

For instance, if argument nlv is set to nlv = "5:10", the prediction for a new observation is the most occurent level (vote) over the predictions returned by the models with 5 LVS, 6 LVs, ... 10 LVs, respectively.

- lwplsrda_agg: use plsrda.

- lwplslda_agg: use plslda.

- lwplsqda_agg: use plsqda.

Usage


lwplsrda_agg(
    X, y,
    nlvdis, diss = c("eucl", "mahal"),
    h, k,
    nlv,
    cri = 4,
    verb = FALSE
    ) 

lwplslda_agg(
    X, y,
    nlvdis, diss = c("eucl", "mahal"),
    h, k,
    nlv, 
    prior = c("unif", "prop"),
    cri = 4,
    verb = FALSE
    ) 

lwplsqda_agg(
    X, y,
    nlvdis, diss = c("eucl", "mahal"),
    h, k,
    nlv, 
    prior = c("unif", "prop"),
    cri = 4,
    verb = FALSE
    ) 

## S3 method for class 'Lwplsrda_agg'
predict(object, X, ...)

## S3 method for class 'Lwplsprobda_agg'
predict(object, X, ...)

Arguments

`X`	For the main functions: Training X-data (`n, p`). — For the auxiliary functions: New X-data (`m, p`) to consider.
`y`	Training class membership (`n`). Note: If `y` is a factor, it is replaced by a character vector.
`nlvdis`	The number of LVs to consider in the global PLS used for the dimension reduction before calculating the dissimilarities. If `nlvdis = 0`, there is no dimension reduction.
`diss`	The type of dissimilarity used for defining the neighbors. Possible values are "eucl" (default; Euclidean distance), "mahal" (Mahalanobis distance), or "correlation". Correlation dissimilarities are calculated by sqrt(.5 * (1 - rho)).
`h`	A scale scalar defining the shape of the weight function. Lower is `h`, sharper is the function. See `wdist`.
`k`	The number of nearest neighbors to select for each observation to predict.
`nlv`	A character string such as "5:20" defining the range of the numbers of LVs to consider (here: the models with nb LVS = 5, 6, ..., 20 are averaged). Syntax such as "10" is also allowed (here: correponds to the single model with 10 LVs).
`prior`	For `lwplslda_agg` and `lwplsqda_agg`: The prior probabilities of the classes. Possible values are "unif" (default; probabilities are set equal for all the classes) or "prop" (probabilities are set equal to the observed proportions of the classes in `y`).
`cri`	Argument `cri` in function `wdist`.
`verb`	Logical. If `TRUE`, fitting information are printed.
`object`	For the auxiliary functions: A fitted model, output of a call to the main function.
`...`	For the auxiliary functions: Optional arguments. Not used.

Value

For lwplsrda_agg, lwplslda_agg and lwplsqda_agg: object of class lwplsrda_agg, lwplslda_agg or lwplsqda_agg

For predict.Lwplsrda_agg and predict.Lwplsprobda_agg:

`pred`	prediction calculated for each observation, which is the most occurent level (vote) over the predictions returned by the models with different numbers of LVS respectively
`listnn`	list with the neighbors used for each observation to be predicted
`listd`	list with the distances to the neighbors used for each observation to be predicted
`listw`	list with the weights attributed to the neighbors used for each observation to be predicted

Note

The first example concerns KNN-LWPLSRDA-AGG. The second example concerns KNN-LWPLSLDA-AGG.

Examples


## KNN-LWPLSRDA-AGG

n <- 40 ; p <- 7
X <- matrix(rnorm(n * p), ncol = p, byrow = TRUE)
y <- sample(c(1, 4, 10), size = n, replace = TRUE)

Xtrain <- X ; ytrain <- y
m <- 5
Xtest <- X[1:m, ] ; ytest <- y[1:m]

nlvdis <- 5 ; diss <- "mahal"
h <- 2 ; k <- 10
nlv <- "2:4" 
fm <- lwplsrda_agg(
    Xtrain, ytrain, 
    nlvdis = nlvdis, diss = diss,
    h = h, k = k,
    nlv = nlv)
res <- predict(fm, Xtest)
res$pred
res$listnn


nlvdis <- 5 ; diss <- "mahal"
h <- c(2, Inf)
k <- c(10, 15)
nlv <- c("1:3", "2:4")
pars <- mpars(nlvdis = nlvdis, diss = diss,
              h = h, k = k, nlv = nlv)
pars

res <- gridscore(
    Xtrain, ytrain, Xtest, ytest, 
    score = err, 
    fun = lwplsrda_agg, 
    pars = pars)
res

segm <- segmkf(n = n, K = 3, nrep = 1)
res <- gridcv(
    Xtrain, ytrain, 
    segm, score = err, 
    fun = lwplsrda_agg, 
    pars = pars,
    verb = TRUE)
names(res)
res$val


## KNN-LWPLSLDA-AGG

n <- 40 ; p <- 7
X <- matrix(rnorm(n * p), ncol = p, byrow = TRUE)
y <- sample(c(1, 4, 10), size = n, replace = TRUE)

Xtrain <- X ; ytrain <- y
m <- 5
Xtest <- X[1:m, ] ; ytest <- y[1:m]

nlvdis <- 5 ; diss <- "mahal"
h <- 2 ; k <- 10
nlv <- "2:4" 
fm <- lwplslda_agg(
    Xtrain, ytrain, 
    nlvdis = nlvdis, diss = diss,
    h = h, k = k,
    nlv = nlv, prior = "prop")
res <- predict(fm, Xtest)
res$pred
res$listnn

nlvdis <- 5 ; diss <- "mahal"
h <- c(2, Inf)
k <- c(10, 15)
nlv <- c("1:3", "2:4")
pars <- mpars(nlvdis = nlvdis, diss = diss,
              h = h, k = k, nlv = nlv, 
              prior = c("unif", "prop"))
pars

res <- gridscore(
    Xtrain, ytrain, Xtest, ytest, 
    score = err, 
    fun = lwplslda_agg, 
    pars = pars)
res

segm <- segmkf(n = n, K = 3, nrep = 1)
res <- gridcv(
    Xtrain, ytrain, 
    segm, score = err, 
    fun = lwplslda_agg, 
    pars = pars,
    verb = TRUE)
names(res)
res$val

[Package rchemo version 0.1-2 Index]