seqICP.s {seqICP}R Documentation

Sequential Invariant Causal Prediction for an individual set S

Description

Tests whether the conditional distribution of Y given X^S is invariant across time, by assuming a linear dependence model.

Usage

seqICP.s(X, Y, S, test = "decoupled", par.test = list(grid = c(0,
  round(nrow(X)/2), nrow(X)), complements = FALSE, link = sum, alpha = 0.05, B =
  100, permutation = FALSE), model = "iid", par.model = list(pknown = FALSE,
  p = 0, max.p = 10))

Arguments

X

matrix of predictor variables. Each column corresponds to one predictor variable.

Y

vector of target variable, with length(Y)=nrow(X).

S

vector containing the indicies of predictors to be tested

test

string specifying the hypothesis test used to test for invariance of a parent set S (i.e. the null hypothesis H0_S). The following tests are available: "decoupled", "combined", "trend", "variance", "block.mean", "block.variance", "block.decoupled", "smooth.mean", "smooth.variance", "smooth.decoupled" and "hsic".

par.test

parameters specifying hypothesis test. The following parameters are available: grid, complements, link, alpha, B and permutation. The parameter grid is an increasing vector of gridpoints used to construct enviornments for change point based tests. If the parameter complements is 'TRUE' each environment is compared against its complement if it is 'FALSE' all environments are compared pairwise. The parameter link specifies how to compare the pairwise test statistics, generally this is either max or sum. The parameter alpha is a numeric value in (0,1) indicting the significance level of the hypothesis test. The parameter B is an integer and specifies the number of Monte-Carlo samples (or permutations) used in the approximation of the null distribution. If the parameter permutation is 'TRUE' a permuatation based approach is used to approximate the null distribution, if it is 'FALSE' the scaled residuals approach is used.

model

string specifying the underlying model class. Either "iid" if Y consists of independent observations or "ar" if Y has a linear time dependence structure.

par.model

parameters specifying model. The following parameters are available: pknown, p and max.p. If pknown is 'FALSE' the number of lags will be determined by comparing all fits up to max.p lags using the AIC criterion. If pknown is 'TRUE' the procedure will fit p lags.

Details

The function can be applied to two types of models
(1) a linear model (model="iid")
Y_i = a X_i^S + N_i
with iid noise N_i and
(2) a linear autoregressive model (model="ar")
Y_t = a_0 X_t^S + ... + a_p (Y_(t-p),X_(t-p)) + N_t
with iid noise N_t.

For both models the hypothesis test specified by the test parameter is used to test whether the set S leads to an invariant model. For futher details see the references.

Value

list containing the following elements

test.stat

value of the test statistic.

crit.value

critical value computed using a Monte-Carlo simulation of the null distribution.

p.value

p-value.

p

number of lags that were used.

model.fit

'lm' object of linear model fit.

Author(s)

Niklas Pfister and Jonas Peters

References

Pfister, N., P. Bühlmann and J. Peters (2017). Invariant Causal Prediction for Sequential Data. ArXiv e-prints (1706.08058).

Peters, J., P. Bühlmann, and N. Meinshausen (2016). Causal inference using invariant prediction: identification and confidence intervals. Journal of the Royal Statistical Society, Series B (with discussion) 78 (5), 947–1012.

See Also

To estimate the set of causal parents use the function seqICP. For non-linear models use the corresponding functions seqICPnl and seqICPnl.s.

Examples

set.seed(1)

# environment 1
na <- 130
X1a <- rnorm(na,0,0.1)
Ya <- 5*X1a+rnorm(na,0,0.5)
X2a <- Ya+rnorm(na,0,0.1)

# environment 2
nb <- 70
X1b <- rnorm(nb,-1,1)
Yb <- 5*X1b+rnorm(nb,0,0.5)
X2b <- rnorm(nb,0,0.1)

# combine environments
X1 <- c(X1a,X1b)
X2 <- c(X2a,X2b)
Y <- c(Ya,Yb)
Xmatrix <- cbind(X1, X2)

# apply seqICP.s to all possible sets - only the true parent set S=1
# is invariant in this example
seqICP.s(Xmatrix, Y, S=numeric(), par.test=list(grid=c(0,50,100,150,200)))
seqICP.s(Xmatrix, Y, S=1, par.test=list(grid=c(0,50,100,150,200)))
seqICP.s(Xmatrix, Y, S=2, par.test=list(grid=c(0,50,100,150,200)))
seqICP.s(Xmatrix, Y, S=c(1,2), par.test=list(grid=c(0,50,100,150,200)))

[Package seqICP version 1.1 Index]