VSURF_interp {VSURF} | R Documentation |
Interpretation step of VSURF
Description
Interpretation step aims to select all variables related to the response for
interpretation purpose. This is the second step of the VSURF
function. It is designed to be executed after the thresholding step
VSURF_thres
.
Usage
VSURF_interp(x, ...)
## Default S3 method:
VSURF_interp(
x,
y,
vars,
ntree.interp = 100,
nfor.interp = 25,
nsd = 1,
RFimplem = "randomForest",
parallel = FALSE,
ncores = detectCores() - 1,
clusterType = "PSOCK",
verbose = TRUE,
ntree = NULL,
...
)
## S3 method for class 'formula'
VSURF_interp(formula, data, ..., na.action = na.fail)
Arguments
x , formula |
A data frame or a matrix of predictors, the columns represent the variables. Or a formula describing the model to be fitted. |
... |
others parameters to be passed on to the |
y |
A response vector (must be a factor for classification problems and numeric for regression ones). |
vars |
A vector of variable indices. Typically, indices of variables
selected by thresholding step (see value |
ntree.interp |
Number of trees of each forest grown. |
nfor.interp |
Number of forests grown. |
nsd |
Number of times the standard deviation of the minimum value of
|
RFimplem |
Choice of the random forests implementation to use :
"randomForest" (default), "ranger" or "Rborist" (not that if "Rborist" is
chosen, "randoForest" will still be used for the first step
|
parallel |
A logical indicating if you want VSURF to run in parallel on
multiple cores (default to FALSE). If a vector of length 3 is given,
each coordinate is passed to each intermediate function: |
ncores |
Number of cores to use. Default is set to the number of cores detected by R minus 1. |
clusterType |
Type of the multiple cores cluster used to run VSURF in
parallel. Must be chosen among "PSOCK" (default: SOCKET cluster available
locally on all OS), "FORK" (local too, only available for Linux and Mac
OS), "MPI" (can be used on a remote cluster, which needs |
verbose |
A logical indicating if information about method's progress (included progress bars for each step) must be printed (default to TRUE). Adds a small extra overload. |
ntree |
(deprecated) Number of trees in each forest grown for "thresholding step". |
data |
a data frame containing the variables in the model. |
na.action |
A function to specify the action to be taken if NAs are
found. (NOTE: If given, this argument must be named, and as
|
Details
nfor.interp
embedded random forests models are grown, starting with the
random forest build with only the most important variable and ending with all
variables. Then, err.min
the minimum mean out-of-bag (OOB) error rate
of these models and its associated standard deviation sd.min
are
computed. Finally, the smallest model (and hence its corresponding variables)
having a mean OOB error less than err.min
+ nsd
* sd.min
is selected.
Note that, the mtry
parameter of randomForest
is set to its
default value (see randomForest
) if nvm
, the number of
variables in the model, is not greater than the number of observations, while
it is set to nvm/3
otherwise. This is to ensure quality of OOB error
estimations along embedded RF models.
Value
An object of class VSURF_interp
, which is a list with the
following components:
varselect.interp |
A vector of indices of selected variables. |
err.interp |
A vector of the mean OOB error rates of the embedded random forests models. |
sd.min |
The standard deviation of OOB error rates associated to the random forests model attaining the minimum mean OOB error rate. |
num.varselect.interp |
The number of selected variables. |
varselect.thres |
A vector of indexes of variables selected after "thresholding step", sorted according to their mean VI, in decreasing order. |
nsd |
Value of the parameter in the call. |
comput.time |
Computation time. |
RFimplem |
The RF implementation used to run
|
ncores |
The number of cores used to run |
clusterType |
The type of the cluster used to run |
call |
The original call to |
terms |
Terms associated to the formula (only if formula-type call was used). |
Author(s)
Robin Genuer, Jean-Michel Poggi and Christine Tuleau-Malot
References
Genuer, R. and Poggi, J.M. and Tuleau-Malot, C. (2010), Variable selection using random forests, Pattern Recognition Letters 31(14), 2225-2236
Genuer, R. and Poggi, J.M. and Tuleau-Malot, C. (2015), VSURF: An R Package for Variable Selection Using Random Forests, The R Journal 7(2):19-33
See Also
Examples
data(iris)
iris.thres <- VSURF_thres(iris[,1:4], iris[,5])
iris.interp <- VSURF_interp(iris[,1:4], iris[,5],
vars = iris.thres$varselect.thres)
iris.interp
## Not run:
# A more interesting example with toys data (see \code{\link{toys}})
# (a few minutes to execute)
data(toys)
toys.thres <- VSURF_thres(toys$x, toys$y)
toys.interp <- VSURF_interp(toys$x, toys$y,
vars = toys.thres$varselect.thres)
toys.interp
## End(Not run)