nonlinearICP {nonlinearICP}R Documentation

Nonlinear Invariant Causal Prediction

Description

Nonlinear Invariant Causal Prediction

Usage

nonlinearICP(X, Y, environment,
  condIndTest = InvariantResidualDistributionTest, argsCondIndTest = NULL,
  alpha = 0.05, varPreSelectionFunc = NULL,
  argsVarPreSelectionFunc = NULL, maxSizeSets = ncol(X),
  condIndTestNames = NULL, speedUp = FALSE, subsampleSize = c(0.1, 0.25,
  0.5, 0.75, 1), retrieveDefiningsSets = TRUE, seed = 1,
  stopIfEmpty = TRUE, testAdditionalSet = NULL, verbose = FALSE)

Arguments

X

A (nxp)-dimensional matrix (or data frame) with n observations of p variables.

Y

A (nx1)-dimensional response vector.

environment

Environment variable(s) in an (n x k)-dimensional matrix or dataframe. Note that not all nonlinear conditional independence tests may support more than one environmental variable.

condIndTest

Function implementing a conditional independence test (see below for the required interface). Defaults to InvariantResidualDistributionTest from the package CondIndTests.

argsCondIndTest

Arguments of condIndTest. Defaults to NULL.

alpha

Significance level to be used. Defaults to 0.05.

varPreSelectionFunc

Variable selection function that is applied to pre-select a set of variables before running the ICP procedure on the resulting subset. Should be used with care as causal parents might be excluded in this step. Defaults to NULL.

argsVarPreSelectionFunc

Arguments of varPreSelectionFunc. Defaults to NULL.

maxSizeSets

Maximal size of sets considered as causal parents. Defaults to ncol(X).

condIndTestNames

Name of conditional independence test, used for printing. Defaults to NULL.

speedUp

Use subsamples of sizes specified in subsampleSize to speed up the test for sets where the null hypothesis can already be rejected based on a small number of samples (a larger sample size would potentially further decrease the p-value but would not change the decision, i.e. the set is rejected in any case). Applies Bonferroni multiple testing correction. Defaults to FALSE.

subsampleSize

Size of subsamples used in speedUp procedure as fraction of total sample size. Defaults to c(0.1, 0.25, 0.5, 0.75, 1).

retrieveDefiningsSets

Boolean variable to indicate whether defining sets should be retrieved. Defaults to TRUE.

seed

Random seed.

stopIfEmpty

Stop ICP procedure if retrieved set is empty. If retrieveDefiningsSets is TRUE, setting stopIfEmpty to TRUE results in testing further sets to retrieve the defining sets. However, setting stopIfEmpty to TRUE in this case will still speedup the procedure as some sets will not be tested (namely those where accepting/rejecting would not affect the defining sets). Setting stopIfEmpty to FALSE means that all possible subsets of the predictors are tested.

testAdditionalSet

If a particular set should be tested, the corresponding indices can be provided via this argument.

verbose

Boolean variable to indicate whether messages should be printed.

Details

The function provided as condIndTest needs to take the following arguments in the given order: Y, environment, X, alpha, verbose. Additional arguments can then be provided via argsCondIndTest.

Value

A list with the following elements:

References

Please cite C. Heinze-Deml, J. Peters and N. Meinshausen: "Invariant Causal Prediction for Nonlinear Models", arXiv:1706.08576.

See Also

The function CondIndTest from the package CondIndTests is a wrapper for a variety of nonlinear conditional independence tests that can be used in condIndTest.

Examples

# Example 1
require(CondIndTests)
data("simData")
targetVar <- 2
# choose environments where we did not intervene on var
useEnvs <- which(simData$interventionVar[,targetVar] == 0)
ind <- is.element(simData$environment, useEnvs)
X <- simData$X[ind,-targetVar]
Y <- simData$X[ind,targetVar]
E <- as.factor(simData$environment[ind])
result <- nonlinearICP(X = X, Y = Y, environment = E)
cat(paste("Variable",result$retrievedCausalVars, "was retrieved as the causal
parent of target variable", targetVar))

###################################################

# Example 2
E <- rep(c(1,2), each = 500)
X1 <- E + 0.1*rnorm(1000)
X1 <- rnorm(1000)
X2 <- X1 + E^2 + 0.1*rnorm(1000)
Y <- X1 + X2 + 0.1*rnorm(1000)
resultnonlinICP <- nonlinearICP(cbind(X1,X2), Y, as.factor(E))
summary(resultnonlinICP)

[Package nonlinearICP version 0.1.2.1 Index]