varSelectionRF {nonlinearICP}R Documentation

Variable selection function that can be provided to nonlinearICP - it is then applied to pre-select a set of variables before running the ICP procedure on this subset. Here, the variable selection is based on random forest variable importance measures.

Description

Variable selection function that can be provided to nonlinearICP - it is then applied to pre-select a set of variables before running the ICP procedure on this subset. Here, the variable selection is based on random forest variable importance measures.

Usage

varSelectionRF(X, Y, env, verbose, nSelect = sqrt(ncol(X)),
  useMtry = sqrt(ncol(X)), ntree = 100)

Arguments

X

A (nxp)-dimensional matrix (or data frame) with n observations of p variables.

Y

Response vector (n x 1)

env

Indicator of the experiment or the intervention type an observation belongs to. A numeric vector of length n. Has to contain at least two different unique values.

verbose

If FALSE, most messages are supressed.

nSelect

Number of variables to select. Defaults to sqrt(ncol(X)).

useMtry

Random forest parameter mtry. Defaults to sqrt(ncol(X)).

ntree

Random forest parameter ntree. Defaults to 100.

Value

A vector containing the indices of the selected variables.

Examples

# Example 1
require(CondIndTests)
data("simData")
targetVar <- 2
# choose environments where we did not intervene on var
useEnvs <- which(simData$interventionVar[,targetVar] == 0)
ind <- is.element(simData$environment, useEnvs)
X <- simData$X[ind,-targetVar]
Y <- simData$X[ind,targetVar]
E <- as.factor(simData$environment[ind])
chosenIdx <- varSelectionRF(X = X, Y = Y, env = E, verbose = TRUE)
cat(paste("Variable(s)", paste(chosenIdx, collapse=", "), "was/were chosen."))

[Package nonlinearICP version 0.1.2.1 Index]