R: Light version of PLS_glm for cross validation purposes

PLS_glm_wvc {plsRglm}

R Documentation

Light version of PLS_glm for cross validation purposes

Description

Light version of PLS_glm for cross validation purposes either on complete or incomplete datasets.

Usage

PLS_glm_wvc(
  dataY,
  dataX,
  nt = 2,
  dataPredictY = dataX,
  modele = "pls",
  family = NULL,
  scaleX = TRUE,
  scaleY = NULL,
  keepcoeffs = FALSE,
  keepstd.coeffs = FALSE,
  tol_Xi = 10^(-12),
  weights,
  method = "logistic",
  verbose = TRUE
)

Arguments

`dataY`	response (training) dataset
`dataX`	predictor(s) (training) dataset
`nt`	number of components to be extracted
`dataPredictY`	predictor(s) (testing) dataset
`modele`	name of the PLS glm model to be fitted (`"pls"`, `"pls-glm-Gamma"`, `"pls-glm-gaussian"`, `"pls-glm-inverse.gaussian"`, `"pls-glm-logistic"`, `"pls-glm-poisson"`, `"pls-glm-polr"`). Use `"modele=pls-glm-family"` to enable the `family` option.
`family`	a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. (See `family` for details of family functions.) To use the family option, please set `modele="pls-glm-family"`. User defined families can also be defined. See details.
`scaleX`	scale the predictor(s) : must be set to TRUE for `modele="pls"` and should be for glms pls.
`scaleY`	scale the response : Yes/No. Ignored since non always possible for glm responses.
`keepcoeffs`	whether the coefficients of the linear fit on link scale of unstandardized eXplanatory variables should be returned or not.
`keepstd.coeffs`	whether the coefficients of the linear fit on link scale of standardized eXplanatory variables should be returned or not.
`tol_Xi`	minimal value for Norm2(Xi) and `\mathrm{det}(pp' \times pp)` if there is any missing value in the `dataX`. It defaults to `10^{-12}`
`weights`	an optional vector of 'prior weights' to be used in the fitting process. Should be `NULL` or a numeric vector.
`method`	logistic, probit, complementary log-log or cauchit (corresponding to a Cauchy latent variable).
`verbose`	should info messages be displayed ?

Details

This function is called by PLS_glm_kfoldcv_formula in order to perform cross-validation either on complete or incomplete datasets.

There are seven different predefined models with predefined link functions available :

list("\"pls\""): ordinary pls models
list("\"pls-glm-Gamma\""): glm gaussian with inverse link pls models
list("\"pls-glm-gaussian\""): glm gaussian with identity link pls models
list("\"pls-glm-inverse-gamma\""): glm binomial with square inverse link pls models
list("\"pls-glm-logistic\""): glm binomial with logit link pls models
list("\"pls-glm-poisson\""): glm poisson with log link pls models
list("\"pls-glm-polr\""): glm polr with logit link pls models

Using the "family=" option and setting "modele=pls-glm-family" allows changing the family and link function the same way as for the glm function. As a consequence user-specified families can also be used.

The: accepts the links (as names) identity, log and inverse.
list("gaussian"): accepts the links (as names) identity, log and inverse.
family: accepts the links (as names) identity, log and inverse.
The: accepts the links logit, probit, cauchit, (corresponding to logistic, normal and Cauchy CDFs respectively) log and cloglog (complementary log-log).
list("binomial"): accepts the links logit, probit, cauchit, (corresponding to logistic, normal and Cauchy CDFs respectively) log and cloglog (complementary log-log).
family: accepts the links logit, probit, cauchit, (corresponding to logistic, normal and Cauchy CDFs respectively) log and cloglog (complementary log-log).
The: accepts the links inverse, identity and log.
list("Gamma"): accepts the links inverse, identity and log.
family: accepts the links inverse, identity and log.
The: accepts the links log, identity, and sqrt.
list("poisson"): accepts the links log, identity, and sqrt.
family: accepts the links log, identity, and sqrt.
The: accepts the links 1/mu^2, inverse, identity and log.
list("inverse.gaussian"): accepts the links 1/mu^2, inverse, identity and log.
family: accepts the links 1/mu^2, inverse, identity and log.
The: accepts the links logit, probit, cloglog, identity, inverse, log, 1/mu^2 and sqrt.
list("quasi"): accepts the links logit, probit, cloglog, identity, inverse, log, 1/mu^2 and sqrt.
family: accepts the links logit, probit, cloglog, identity, inverse, log, 1/mu^2 and sqrt.
The function: can be used to create a power link function.
list("power"): can be used to create a power link function.

Non-NULL weights can be used to indicate that different observations have different dispersions (with the values in weights being inversely proportional to the dispersions); or equivalently, when the elements of weights are positive integers w_i, that each response y_i is the mean of w_i unit-weight observations.

Value

`valsPredict`	`nrow(dataPredictY) * nt` matrix of the predicted values
`list("coeffs")`	If the coefficients of the eXplanatory variables were requested: i.e. `keepcoeffs=TRUE`. `ncol(dataX) * 1` matrix of the coefficients of the the eXplanatory variables

Author(s)

Frédéric Bertrand
frederic.bertrand@utt.fr
https://fbertran.github.io/homepage/

References

Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. http://publications-sfds.math.cnrs.fr/index.php/J-SFdS/article/view/47

Examples


data(Cornell)
XCornell<-Cornell[,1:7]
yCornell<-Cornell[,8]
PLS_glm_wvc(dataY=yCornell,dataX=XCornell,nt=3,modele="pls-glm-gaussian",
dataPredictY=XCornell[1,])
PLS_glm_wvc(dataY=yCornell,dataX=XCornell,nt=3,modele="pls-glm-family",
family=gaussian(),dataPredictY=XCornell[1,], verbose=FALSE)
PLS_glm_wvc(dataY=yCornell[-1],dataX=XCornell[-1,],nt=3,modele="pls-glm-gaussian",
dataPredictY=XCornell[1,], verbose=FALSE)
PLS_glm_wvc(dataY=yCornell[-1],dataX=XCornell[-1,],nt=3,modele="pls-glm-family",
family=gaussian(),dataPredictY=XCornell[1,], verbose=FALSE)
rm("XCornell","yCornell")


## With an incomplete dataset (X[1,2] is NA)
data(pine)
ypine <- pine[,11]
data(XpineNAX21)
PLS_glm_wvc(dataY=ypine,dataX=XpineNAX21,nt=10,modele="pls-glm-gaussian")
rm("XpineNAX21","ypine")

data(pine)
Xpine<-pine[,1:10]
ypine<-pine[,11]
PLS_glm_wvc(ypine,Xpine,10,modele="pls", verbose=FALSE)
PLS_glm_wvc(ypine,Xpine,10,modele="pls-glm-Gamma", verbose=FALSE)
PLS_glm_wvc(ypine,Xpine,10,modele="pls-glm-family",family=Gamma(), verbose=FALSE)
PLS_glm_wvc(ypine,Xpine,10,modele="pls-glm-gaussian", verbose=FALSE)
PLS_glm_wvc(ypine,Xpine,10,modele="pls-glm-family",family=gaussian(log), verbose=FALSE)
PLS_glm_wvc(round(ypine),Xpine,10,modele="pls-glm-poisson", verbose=FALSE)
PLS_glm_wvc(round(ypine),Xpine,10,modele="pls-glm-family",family=poisson(log), verbose=FALSE)
rm(list=c("pine","ypine","Xpine"))


data(Cornell)
XCornell<-Cornell[,1:7]
yCornell<-Cornell[,8]
PLS_glm_wvc(yCornell,XCornell,10,modele="pls-glm-inverse.gaussian", verbose=FALSE)
PLS_glm_wvc(yCornell,XCornell,10,modele="pls-glm-family",
family=inverse.gaussian(), verbose=FALSE)
rm(list=c("XCornell","yCornell"))


data(Cornell)
XCornell<-Cornell[,1:7]
yCornell<-Cornell[,8]
PLS_glm_wvc(dataY=yCornell,dataX=XCornell,nt=3,modele="pls-glm-gaussian",
dataPredictY=XCornell[1,], verbose=FALSE)
PLS_glm_wvc(dataY=yCornell[-1],dataX=XCornell[-1,],nt=3,modele="pls-glm-gaussian",
dataPredictY=XCornell[1,], verbose=FALSE)
rm("XCornell","yCornell")

data(aze_compl)
Xaze_compl<-aze_compl[,2:34]
yaze_compl<-aze_compl$y
PLS_glm(yaze_compl,Xaze_compl,10,modele="pls-glm-logistic",typeVC="none", verbose=FALSE)$InfCrit
PLS_glm_wvc(yaze_compl,Xaze_compl,10,modele="pls-glm-logistic", keepcoeffs=TRUE, verbose=FALSE)
rm("Xaze_compl","yaze_compl")

[Package plsRglm version 1.5.1 Index]