seqICP {seqICP} | R Documentation |
Sequential Invariant Causal Prediction
Description
Estimates the causal parents S of the target variable Y using invariant causal prediction and fits a linear model of the form
Y = a X^S + N.
Usage
seqICP(X, Y, test = "decoupled", par.test = list(grid = c(0,
round(nrow(X)/2), nrow(X)), complements = FALSE, link = sum, alpha = 0.05, B =
100, permutation = FALSE), model = "iid", par.model = list(pknown = FALSE,
p = 0, max.p = 10), max.parents = ncol(X), stopIfEmpty = TRUE,
silent = TRUE)
Arguments
X |
matrix of predictor variables. Each column corresponds to one predictor variable. |
Y |
vector of target variable, with length(Y)=nrow(X). |
test |
string specifying the hypothesis test used to test for invariance of a parent set S (i.e. the null hypothesis H0_S). The following tests are available: "decoupled", "combined", "trend", "variance", "block.mean", "block.variance", "block.decoupled", "smooth.mean", "smooth.variance", "smooth.decoupled" and "hsic". |
par.test |
parameters specifying hypothesis test. The
following parameters are available: |
model |
string specifying the underlying model class. Either "iid" if Y consists of independent observations or "ar" if Y has a linear time dependence structure. |
par.model |
parameters specifying model. The following
parameters are available: |
max.parents |
integer specifying the maximum size for admissible parents. Reducing this below the number of predictor variables saves computational time but means that the confidence intervals lose their coverage property. |
stopIfEmpty |
if ‘TRUE’, the procedure will stop computing confidence intervals if the empty set has been accepted (and hence no variable can have a signicificant causal effect). Setting to ‘TRUE’ will save computational time in these cases, but means that the confidence intervals lose their coverage properties for values different to 0. |
silent |
If 'FALSE', the procedure will output progress notifications consisting of the currently computed set S together with the p-value resulting from the null hypothesis H0_S |
Details
The function can be applied to two types of models
(1) a linear model (model="iid")
Y_i = a X_i^S + N_i
with iid noise N_i and
(2) a linear autoregressive model (model="ar")
Y_t = a_0 X_t^S + ... + a_p (Y_(t-p),X_(t-p)) + N_t
with iid noise N_t.
For both models the invariant prediction procedure is applied
using the hypothesis test specified by the test
parameter
to determine whether a candidate model is invariant. For further
details see the references.
Value
object of class 'seqICP' consisting of the following elements
parent.set |
vector of the estimated causal parents. |
test.results |
matrix containing the result from each individual test as rows. |
S |
list of all the sets that were tested. The position within the list corresponds to the index in the first column of the test.results matrix. |
p.values |
p-value for being not included in the set of true causal parents. (If a p-value is smaller than alpha, the corresponding variable is a member of parent.set.) |
coefficients |
vector of coefficients resulting from a regression based on the estimated parent set. |
stopIfEmpty |
a boolean value indicating whether computations stop as soon as intersection of accepted sets is empty. |
modelReject |
a boolean value indicating if the whole model was rejected (the p-value of the best fitting model is too low). |
pknown |
a boolean value indicating whether the number of lags in the model was known. Only relevant if model was set to "ar". |
alpha |
significance level at which the hypothesis tests were performed. |
n.var |
number of predictor variables. |
model |
either "iid" or "ar" depending on which model was selected. |
Author(s)
Niklas Pfister and Jonas Peters
References
Pfister, N., P. Bühlmann and J. Peters (2017). Invariant Causal Prediction for Sequential Data. ArXiv e-prints (1706.08058).
Peters, J., P. Bühlmann, and N. Meinshausen (2016). Causal inference using invariant prediction: identification and confidence intervals. Journal of the Royal Statistical Society, Series B (with discussion) 78 (5), 947–1012.
See Also
The function seqICP.s
allows to perform
hypothesis test for individual sets S. For non-linear
models the functions seqICPnl
and
seqICPnl.s
can be used.
Examples
set.seed(1)
# environment 1
na <- 140
X1a <- 0.3*rnorm(na)
X3a <- X1a + 0.2*rnorm(na)
Ya <- -.7*X1a + .6*X3a + 0.1*rnorm(na)
X2a <- -0.5*Ya + 0.5*X3a + 0.1*rnorm(na)
# environment 2
nb <- 80
X1b <- 0.3*rnorm(nb)
X3b <- 0.5*rnorm(nb)
Yb <- -.7*X1b + .6*X3b + 0.1*rnorm(nb)
X2b <- -0.5*Yb + 0.5*X3b + 0.1*rnorm(nb)
# combine environments
X1 <- c(X1a,X1b)
X2 <- c(X2a,X2b)
X3 <- c(X3a,X3b)
Y <- c(Ya,Yb)
Xmatrix <- cbind(X1, X2, X3)
# Y follows the same structural assignment in both environments
# a and b (cf. the lines Ya <- ... and Yb <- ...).
# The direct causes of Y are X1 and X3.
# A linear model considers X1, X2 and X3 as significant.
# All these variables are helpful for the prediction of Y.
summary(lm(Y~Xmatrix))
# apply seqICP to the same setting
seqICP.result <- seqICP(X = Xmatrix, Y,
par.test = list(grid = seq(0, na + nb, (na + nb)/10), complements = FALSE, link = sum,
alpha = 0.05, B =100), max.parents = 4, stopIfEmpty=FALSE, silent=FALSE)
summary(seqICP.result)
# seqICP is able to infer that X1 and X3 are causes of Y