backShift {backShift} | R Documentation |
Estimate connectivity matrix of a directed graph with linear effects and hidden variables.
Description
This function estimates the connectivity matrix of a directed
(possibly cyclic) graph with hidden variables. The underlying system is required
to be linear and we assume that observations under different shift interventions
are available. More precisely, the function takes as an input an (nxp) data matrix,
where n
is the sample size and p
the number of variables. In each
environment j
(j
in {1, \ldots, J
}) we have observed
n_j
samples generated from
X_j= X_j * A + c_j + e_j
(in case of cycles this should be understood as an equilibrium
distribution). The c_j
is a p-dimensional random vector that is assumed
to have a diagonal covariance matrix. The noise vector e_j
is assumed to
have the same distribution in all environments j
but is allowed to have
an arbitrary covariance matrix. The different intervention settings are provided
to the method with the help of the vector ExpInd
of length
n = (n_1 + ... + n_j + ... + n_J)
. The goal is to estimate the
connectivity matrix A
.
Usage
backShift(X, ExpInd, covariance=TRUE, ev=0, threshold =0.75, nsim=100,
sampleSettings=1/sqrt(2), sampleObservations=1/sqrt(2),
nodewise=TRUE, tolerance = 10^(-4), baseSettingEnv = 1,
verbose = FALSE)
Arguments
X |
A (nxp)-dimensional matrix (or data frame) with n observations of p variables. |
ExpInd |
Indicator of the experiment or the intervention type an observation belongs to. A numeric vector of length n. Has to contain at least three different unique values. |
covariance |
A boolean variable. If |
ev |
The expected number of false selections for stability selection.
No stability selection computed if |
threshold |
The selection threshold for stability selection (has to be between 0.5 and 1).
Edges which are selected with empirical proportion higher than |
nsim |
Number of resamples taken (if using stability selection). |
sampleSettings |
The proportion of unique settings to resample for each resample; has to be in [0,1]. |
sampleObservations |
The fraction of all samples to retain when subsampling (no replacement); has to be in [0,1]. |
nodewise |
If |
tolerance |
Precision parameter for |
baseSettingEnv |
Index for baseline environment against which the intervention variances are measured. Defaults to 1. |
verbose |
If |
Value
A list with elements
Ahat |
The connectivity matrix where entry (i,j) is the effect pointing from variable i to variable j. |
AhatAdjacency |
If |
varianceEnv |
The estimated interventions variances up to an offset.
|
Author(s)
Christina Heinze-Deml <heinzedeml@stat.math.ethz.ch>
References
Dominik Rothenhaeusler, Christina Heinze, Jonas Peters, Nicolai Meinshausen: backShift: Learning causal cyclic graphs from unknown shift interventions. Advances in Neural Information Processing Systems (NIPS) 28, 2015. arXiv: http://arxiv.org/abs/1506.02494
See Also
ICP
and
hiddenICP
for reconstructing
the parents of a variable under interventions on all other variables.
getParents
and
getParentsStable
from the package
CompareCausalNetworks
to estimate the
connectivity matrix of a directed causal graph, using various possible methods
(including backShift
).
Examples
## Simulate data with connectivity matrix A
seed <- 1
# sample size n
n <- 10000
# 3 predictor variables
p <- 3
A <- diag(p)*0
A[1,2] <- 0.8
A[2,3] <- -0.8
A[3,1] <- 0.8
# divide data into 10 different environments
G <- 10
# simulate
simulation.res <- simulateInterventions(
n, p, A, G, intervMultiplier = 2,
noiseMult = 1, nonGauss = FALSE,
fracVarInt = 0.5, hidden = TRUE,
knownInterventions = FALSE,
simulateObs = TRUE, seed)
environment <- simulation.res$environment
X <- simulation.res$X
## Compute feedback estimator with stability selection
network <- backShift(X, environment, ev = 1)
## Print point estimates and stable edges
# true connectivity matrix
print(A)
# point estimate
print(network$Ahat)
# shows empirical selection probability for stable edges
print(network$AhatAdjacency)