seqICPnl.s {seqICP} | R Documentation |
Non-linear Invariant Causal Prediction for an individual set S
Description
Tests whether the conditional distribution of Y given X^S is invariant across time, by allowing for arbitrary non-linear additive dependence models.
Usage
seqICPnl.s(X, Y, S, test = "block.variance", par.test = list(grid = c(0,
round(nrow(X)/2), nrow(X)), complements = FALSE, link = sum, alpha = 0.05, B =
100), regression.fun = function(X, Y) fitted.values(lm.fit(X, Y)))
Arguments
X |
matrix of predictor variables. Each column corresponds to one predictor variable. |
Y |
vector of target variable, with length(Y)=nrow(X). |
S |
vector containing the indicies of predictors to be tested |
test |
string specifying the hypothesis test used to test for invariance of a parent set S (i.e. the null hypothesis H0_S). The following tests are available: "block.mean", "block.variance", "block.decoupled", "smooth.mean", "smooth.variance", "smooth.decoupled" and "hsic". |
par.test |
parameters specifying hypothesis test. The
following parameters are available: |
regression.fun |
regression function used to fit the function f. This should be a function which takes the argument (X,Y) and outputs the predicted values f(Y). |
Details
The function can be applied to models of the form
Y_i =
f(X_i^S) + N_i
with iid noise N_i and f is from a specific
function class, which the regression procedure given by the
parameter regression.fun
should be able to approximate.
For both models the hypothesis test specified by the test
parameter specifies the hypothesis test used to test whether the
set S leads to an invariant model. For futher details see the
references.
Value
list containing the following elements
test.stat |
value of the test statistic. |
crit.value |
critical value computed using a Monte-Carlo simulation of the null distribution. |
p.value |
p-value. |
Author(s)
Niklas Pfister and Jonas Peters
References
Pfister, N., P. Bühlmann and J. Peters (2017). Invariant Causal Prediction for Sequential Data. ArXiv e-prints (1706.08058).
Peters, J., P. Bühlmann, and N. Meinshausen (2016). Causal inference using invariant prediction: identification and confidence intervals. Journal of the Royal Statistical Society, Series B (with discussion) 78 (5), 947–1012.
See Also
To estimate the set of causal parents use the function
seqICPnl
. For linear models use the corresponding
functions seqICP
and seqICP.s
.
Examples
set.seed(1)
# environment 1
na <- 130
X1a <- rnorm(na,0,0.1)
Ya <- 5*X1a+rnorm(na,0,0.5)
X2a <- Ya+rnorm(na,0,0.1)
# environment 2
nb <- 70
X1b <- rnorm(nb,-1,1)
Yb <- X1b^2+rnorm(nb,0,0.5)
X2b <- rnorm(nb,0,0.1)
# combine environments
X1 <- c(X1a,X1b)
X2 <- c(X2a,X2b)
Y <- c(Ya,Yb)
Xmatrix <- cbind(X1, X2)
# use GAM as regression function
GAM <- function(X,Y){
d <- ncol(X)
if(d>1){
formula <- "Y~1"
names <- c("Y")
for(i in 1:(d-1)){
formula <- paste(formula,"+s(X",toString(i),")",sep="")
names <- c(names,paste("X",toString(i),sep=""))
}
data <- data.frame(cbind(Y,X[,-1,drop=FALSE]))
colnames(data) <- names
fit <- fitted.values(mgcv::gam(as.formula(formula),data=data))
} else{
fit <- rep(mean(Y),nrow(X))
}
return(fit)
}
# apply seqICPnl.s to all possible sets using the regression
# function GAM - only the true parent set S=1 is
# invariant in this example
seqICPnl.s(Xmatrix, Y, S=numeric(), par.test=list(grid=c(0,50,100,150,200)), regression.fun=GAM)
seqICPnl.s(Xmatrix, Y, S=1, par.test=list(grid=c(0,50,100,150,200)), regression.fun=GAM)
seqICPnl.s(Xmatrix, Y, S=2, par.test=list(grid=c(0,50,100,150,200)), regression.fun=GAM)
seqICPnl.s(Xmatrix, Y, S=c(1,2), par.test=list(grid=c(0,50,100,150,200)), regression.fun=GAM)