R: Ensemble Sparse Partial Least Squares for Model Applicability...

enspls.ad {enpls}

R Documentation

Ensemble Sparse Partial Least Squares for Model Applicability Domain Evaluation

Description

Model applicability domain evaluation with ensemble sparse partial least squares.

Usage

enspls.ad(x, y, xtest, ytest, maxcomp = 5L, cvfolds = 5L,
  alpha = seq(0.2, 0.8, 0.2), space = c("sample", "variable"),
  method = c("mc", "boot"), reptimes = 500L, ratio = 0.8,
  parallel = 1L)

Arguments

`x`	Predictor matrix of the training set.
`y`	Response vector of the training set.
`xtest`	List, with the i-th component being the i-th test set's predictor matrix (see example code below).
`ytest`	List, with the i-th component being the i-th test set's response vector (see example code below).
`maxcomp`	Maximum number of components included within each model. If not specified, will use `5` by default.
`cvfolds`	Number of cross-validation folds used in each model for automatic parameter selection, default is `5`.
`alpha`	Parameter (grid) controlling sparsity of the model. If not specified, default is `seq(0.2, 0.8, 0.2)`.
`space`	Space in which to apply the resampling method. Can be the sample space (`"sample"`) or the variable space (`"variable"`).
`method`	Resampling method. `"mc"` (Monte-Carlo resampling) or `"boot"` (bootstrapping). Default is `"mc"`.
`reptimes`	Number of models to build with Monte-Carlo resampling or bootstrapping.
`ratio`	Sampling ratio used when `method = "mc"`.
`parallel`	Integer. Number of CPU cores to use. Default is `1` (not parallelized).

Value

A list containing:

tr.error.mean - absolute mean prediction error for training set
tr.error.median - absolute median prediction error for training set
tr.error.sd - prediction error sd for training set
tr.error.matrix - raw prediction error matrix for training set
te.error.mean - list of absolute mean prediction error for test set(s)
te.error.median - list of absolute median prediction error for test set(s)
te.error.sd - list of prediction error sd for test set(s)
te.error.matrix - list of raw prediction error matrix for test set(s)

Note

Note that for space = "variable", method could only be "mc", since bootstrapping in the variable space will create duplicated variables, and that could cause problems.

Author(s)

Nan Xiao <https://nanx.me>

Examples

data("logd1k")
# remove low variance variables
x <- logd1k$x[, -c(17, 52, 59)]
y <- logd1k$y

# training set
x.tr <- x[1:300, ]
y.tr <- y[1:300]

# two test sets
x.te <- list(
  "test.1" = x[301:400, ],
  "test.2" = x[401:500, ]
)
y.te <- list(
  "test.1" = y[301:400],
  "test.2" = y[401:500]
)

set.seed(42)
ad <- enspls.ad(
  x.tr, y.tr, x.te, y.te,
  maxcomp = 3, alpha = c(0.3, 0.6, 0.9),
  space = "variable", method = "mc",
  ratio = 0.8, reptimes = 10
)
print(ad)
plot(ad)
# the interactive plot requires a HTML viewer
## Not run: 
plot(ad, type = "interactive")

## End(Not run)

[Package enpls version 6.1 Index]