enspls.ad {enpls} | R Documentation |
Ensemble Sparse Partial Least Squares for Model Applicability Domain Evaluation
Description
Model applicability domain evaluation with ensemble sparse partial least squares.
Usage
enspls.ad(x, y, xtest, ytest, maxcomp = 5L, cvfolds = 5L,
alpha = seq(0.2, 0.8, 0.2), space = c("sample", "variable"),
method = c("mc", "boot"), reptimes = 500L, ratio = 0.8,
parallel = 1L)
Arguments
x |
Predictor matrix of the training set. |
y |
Response vector of the training set. |
xtest |
List, with the i-th component being the i-th test set's predictor matrix (see example code below). |
ytest |
List, with the i-th component being the i-th test set's response vector (see example code below). |
maxcomp |
Maximum number of components included within each model.
If not specified, will use |
cvfolds |
Number of cross-validation folds used in each model
for automatic parameter selection, default is |
alpha |
Parameter (grid) controlling sparsity of the model.
If not specified, default is |
space |
Space in which to apply the resampling method.
Can be the sample space ( |
method |
Resampling method. |
reptimes |
Number of models to build with Monte-Carlo resampling or bootstrapping. |
ratio |
Sampling ratio used when |
parallel |
Integer. Number of CPU cores to use.
Default is |
Value
A list containing:
-
tr.error.mean
- absolute mean prediction error for training set -
tr.error.median
- absolute median prediction error for training set -
tr.error.sd
- prediction error sd for training set -
tr.error.matrix
- raw prediction error matrix for training set -
te.error.mean
- list of absolute mean prediction error for test set(s) -
te.error.median
- list of absolute median prediction error for test set(s) -
te.error.sd
- list of prediction error sd for test set(s) -
te.error.matrix
- list of raw prediction error matrix for test set(s)
Note
Note that for space = "variable"
, method
could
only be "mc"
, since bootstrapping in the variable space
will create duplicated variables, and that could cause problems.
Author(s)
Nan Xiao <https://nanx.me>
Examples
data("logd1k")
# remove low variance variables
x <- logd1k$x[, -c(17, 52, 59)]
y <- logd1k$y
# training set
x.tr <- x[1:300, ]
y.tr <- y[1:300]
# two test sets
x.te <- list(
"test.1" = x[301:400, ],
"test.2" = x[401:500, ]
)
y.te <- list(
"test.1" = y[301:400],
"test.2" = y[401:500]
)
set.seed(42)
ad <- enspls.ad(
x.tr, y.tr, x.te, y.te,
maxcomp = 3, alpha = c(0.3, 0.6, 0.9),
space = "variable", method = "mc",
ratio = 0.8, reptimes = 10
)
print(ad)
plot(ad)
# the interactive plot requires a HTML viewer
## Not run:
plot(ad, type = "interactive")
## End(Not run)