varSelectionRF {nonlinearICP} | R Documentation |
Variable selection function that can be provided to nonlinearICP
- it
is then applied to pre-select a set of variables before running the ICP procedure
on this subset. Here, the variable selection is based on random forest variable
importance measures.
Description
Variable selection function that can be provided to nonlinearICP
- it
is then applied to pre-select a set of variables before running the ICP procedure
on this subset. Here, the variable selection is based on random forest variable
importance measures.
Usage
varSelectionRF(X, Y, env, verbose, nSelect = sqrt(ncol(X)),
useMtry = sqrt(ncol(X)), ntree = 100)
Arguments
X |
A (nxp)-dimensional matrix (or data frame) with n observations of p variables. |
Y |
Response vector (n x 1) |
env |
Indicator of the experiment or the intervention type an observation belongs to. A numeric vector of length n. Has to contain at least two different unique values. |
verbose |
If |
nSelect |
Number of variables to select. Defaults to |
useMtry |
Random forest parameter |
ntree |
Random forest parameter |
Value
A vector containing the indices of the selected variables.
Examples
# Example 1
require(CondIndTests)
data("simData")
targetVar <- 2
# choose environments where we did not intervene on var
useEnvs <- which(simData$interventionVar[,targetVar] == 0)
ind <- is.element(simData$environment, useEnvs)
X <- simData$X[ind,-targetVar]
Y <- simData$X[ind,targetVar]
E <- as.factor(simData$environment[ind])
chosenIdx <- varSelectionRF(X = X, Y = Y, env = E, verbose = TRUE)
cat(paste("Variable(s)", paste(chosenIdx, collapse=", "), "was/were chosen."))