R: Nonlinear Invariant Causal Prediction

nonlinearICP {nonlinearICP}

R Documentation

Nonlinear Invariant Causal Prediction

Description

Nonlinear Invariant Causal Prediction

Usage

nonlinearICP(X, Y, environment,
  condIndTest = InvariantResidualDistributionTest, argsCondIndTest = NULL,
  alpha = 0.05, varPreSelectionFunc = NULL,
  argsVarPreSelectionFunc = NULL, maxSizeSets = ncol(X),
  condIndTestNames = NULL, speedUp = FALSE, subsampleSize = c(0.1, 0.25,
  0.5, 0.75, 1), retrieveDefiningsSets = TRUE, seed = 1,
  stopIfEmpty = TRUE, testAdditionalSet = NULL, verbose = FALSE)

Arguments

`X`	A (nxp)-dimensional matrix (or data frame) with n observations of p variables.
`Y`	A (nx1)-dimensional response vector.
`environment`	Environment variable(s) in an (n x k)-dimensional matrix or dataframe. Note that not all nonlinear conditional independence tests may support more than one environmental variable.
`condIndTest`	Function implementing a conditional independence test (see below for the required interface). Defaults to `InvariantResidualDistributionTest` from the package `CondIndTests`.
`argsCondIndTest`	Arguments of `condIndTest`. Defaults to `NULL`.
`alpha`	Significance level to be used. Defaults to `0.05`.
`varPreSelectionFunc`	Variable selection function that is applied to pre-select a set of variables before running the ICP procedure on the resulting subset. Should be used with care as causal parents might be excluded in this step. Defaults to `NULL`.
`argsVarPreSelectionFunc`	Arguments of `varPreSelectionFunc`. Defaults to `NULL`.
`maxSizeSets`	Maximal size of sets considered as causal parents. Defaults to `ncol(X)`.
`condIndTestNames`	Name of conditional independence test, used for printing. Defaults to `NULL`.
`speedUp`	Use subsamples of sizes specified in `subsampleSize` to speed up the test for sets where the null hypothesis can already be rejected based on a small number of samples (a larger sample size would potentially further decrease the p-value but would not change the decision, i.e. the set is rejected in any case). Applies Bonferroni multiple testing correction. Defaults to `FALSE`.
`subsampleSize`	Size of subsamples used in `speedUp` procedure as fraction of total sample size. Defaults to `c(0.1, 0.25, 0.5, 0.75, 1)`.
`retrieveDefiningsSets`	Boolean variable to indicate whether defining sets should be retrieved. Defaults to `TRUE`.
`seed`	Random seed.
`stopIfEmpty`	Stop ICP procedure if retrieved set is empty. If `retrieveDefiningsSets` is `TRUE`, setting `stopIfEmpty` to `TRUE` results in testing further sets to retrieve the defining sets. However, setting `stopIfEmpty` to `TRUE` in this case will still speedup the procedure as some sets will not be tested (namely those where accepting/rejecting would not affect the defining sets). Setting `stopIfEmpty` to `FALSE` means that all possible subsets of the predictors are tested.
`testAdditionalSet`	If a particular set should be tested, the corresponding indices can be provided via this argument.
`verbose`	Boolean variable to indicate whether messages should be printed.

Details

The function provided as condIndTest needs to take the following arguments in the given order: Y, environment, X, alpha, verbose. Additional arguments can then be provided via argsCondIndTest.

Value

A list with the following elements:

retrievedCausalVars Indices of variables in \hat{S}
acceptedSets List of accepted sets.
definingSets List of defining sets.
acceptedModels List of accepted models if specified in argsCondIndTest.
pvalues.accepted P-values of accepted sets.
rejectedSets List of rejected sets.
pvalues.rejected P-values of rejected sets.
settings Settings provided to nonlinearICP.

References

Please cite C. Heinze-Deml, J. Peters and N. Meinshausen: "Invariant Causal Prediction for Nonlinear Models", arXiv:1706.08576.

Examples

# Example 1
require(CondIndTests)
data("simData")
targetVar <- 2
# choose environments where we did not intervene on var
useEnvs <- which(simData$interventionVar[,targetVar] == 0)
ind <- is.element(simData$environment, useEnvs)
X <- simData$X[ind,-targetVar]
Y <- simData$X[ind,targetVar]
E <- as.factor(simData$environment[ind])
result <- nonlinearICP(X = X, Y = Y, environment = E)
cat(paste("Variable",result$retrievedCausalVars, "was retrieved as the causal
parent of target variable", targetVar))

###################################################

# Example 2
E <- rep(c(1,2), each = 500)
X1 <- E + 0.1*rnorm(1000)
X1 <- rnorm(1000)
X2 <- X1 + E^2 + 0.1*rnorm(1000)
Y <- X1 + X2 + 0.1*rnorm(1000)
resultnonlinICP <- nonlinearICP(cbind(X1,X2), Y, as.factor(E))
summary(resultnonlinICP)

Nonlinear Invariant Causal Prediction

Description

Usage

Arguments

Details

Value

References

See Also

Examples