locw {rchemo}R Documentation

Locally weighted models

Description

locw and locwlv are generic working functions returning predictions of KNN locally weighted (LW) models. One specific (= local) model is fitted for each observation to predict, and a prediction is returned. See the wrapper lwplsr (KNN-LWPLSR) for an example of use.

In KNN-LW models, the prediction is built from two sequential steps, therafter referred to as weighting "1" and weighting "2", respectively. For each new observation to predict, the two steps are as follow:

- Weighting "1". The k nearest neighbors (in the training data set) are selected and the prediction model is fitted (in the next step) only on this neighborhood. It is equivalent to give a weight = 1 to the neighbors, and a weight = 0 to the other training observations, which corresponds to a binary weighting.

- Weighting "2". Each of the k nearest neighbors eventually receives a weight (different from the usual 1/k) before fitting the model. The weight depend from the dissimilarity (preliminary calculated) between the observation and the neighbor. This corresponds to a within-neighborhood weighting.

The prediction model used in step "2" has to be defined in a function specified in argument fun. If there are m new observations to predict, a list of m vectors defining the m neighborhoods has to be provided (argument listnn). Each of the m vectors contains the indexes of the nearest neighbors in the training set. The m vectors are not necessary of same length, i.e. the neighborhood size can vary between observations to predict. If there is a weighting in step "2", a list of m vectors of weights have to be provided (argument listw). Then locw fits the model successively for each of the m neighborhoods, and returns the corresponding m predictions.

Function locwlv is dedicated to prediction models based on latent variables (LVs) calculations, such as PLSR. It is much faster and recommended.

Usage


locw(Xtrain, Ytrain, X, listnn, listw = NULL, fun, verb = FALSE, ...)

locwlv(Xtrain, Ytrain, X, listnn, listw = NULL, fun, nlv, verb = FALSE, ...)
  

Arguments

Xtrain

Training X-data (n, p).

Ytrain

Training Y-data (n, q).

X

New X-data (m, p) to predict.

listnn

A list of m vectors defining weighting "1". Component i of this list is a vector (of length between 1 and n) of indexes. These indexes define the training observations that are the nearest neighbors of new observation i. Typically, listnn can be built from getknn, but any other list of length m can be provided. The m vectors can have equal length (i.e. the m neighborhoods are of equal size) or not (the number of neighbors varies between the observations to predict).

listw

A list of m vectors defining weighting "2". Component i of this list is a vector (that must have the same length as component i of listnn) of the weights given to the nearest neighbors when the prediction model is fitted. Internally, weights are "normalized" to sum to 1 in each component. Default to NULL (weights are set to 1 / k where kis the size of the neihborhodd).

fun

A function corresponding to the prediction model to fit on the m neighborhoods.

nlv

For locwlv : The number of LVs to calculate.

verb

Logical. If TRUE, fitting information are printed.

...

Optional arguments to pass in function fun.

Value

pred

matrix or list of matrices (if nlv is a vector), with predictions

References

Lesnoff M, Metz M, Roger J-M. Comparison of locally weighted PLS strategies for regression and discrimination on agronomic NIR data. Journal of Chemometrics. 2020;n/a(n/a):e3209. doi:10.1002/cem.3209.

Examples


n <- 50 ; p <- 30
Xtrain <- matrix(rnorm(n * p), ncol = p, byrow = TRUE)
ytrain <- rnorm(n)
Ytrain <- cbind(ytrain, 100 * ytrain)
m <- 4
Xtest <- matrix(rnorm(m * p), ncol = p, byrow = TRUE)
ytest <- rnorm(m)
Ytest <- cbind(ytest, 10 * ytest)

k <- 5
z <- getknn(Xtrain, Xtest, k = k)
listnn <- z$listnn
listd <- z$listd
listnn
listd

listw <- lapply(listd, wdist, h = 2)
listw

nlv <- 2  
locw(Xtrain, Ytrain, Xtest, 
     listnn = listnn, fun = plskern, nlv = nlv)
locw(Xtrain, Ytrain, Xtest, 
     listnn = listnn, listw = listw, fun = plskern, nlv = nlv)

locwlv(Xtrain, Ytrain, Xtest, 
     listnn = listnn, listw = listw, fun = plskern, nlv = nlv)
locwlv(Xtrain, Ytrain, Xtest, 
     listnn = listnn, listw = listw, fun = plskern, nlv = 0:nlv)


[Package rchemo version 0.1-1 Index]