R: Estimate the connectivity matrix of a causal graph using...

getParentsStable {CompareCausalNetworks}

R Documentation

Estimate the connectivity matrix of a causal graph using stability selection.

Description

Estimates the connectivity matrix of a directed causal graph, using various possible methods. Supported methods at the moment are ARGES, backShift, bivariateANM, bivariateCAM, CAM, FCI, FCI+, GES, GIES, hiddenICP, ICP, LINGAM, MMHC, rankARGES, rankFci, rankGES, rankGIES, rankPC, regression, RFCI and PC. Uses stability selection to select an appropriate sparseness.

Usage

getParentsStable(
  X,
  environment,
  interventions = NULL,
  EV = 1,
  nodewise = TRUE,
  threshold = 0.75,
  nsim = 100,
  sampleSettings = 1/sqrt(2),
  sampleObservations = 1/sqrt(2),
  parentsOf = 1:ncol(X),
  method = c("ICP", "hiddenICP", "backShift", "pc", "LINGAM", "ges", "gies", "CAM",
    "fci", "rfci", "regression", "bivariateANM", "bivariateCAM")[1],
  alpha = 0.1,
  mode = c("raw", "parental", "ancestral")[1],
  variableSelMat = NULL,
  excludeTargetInterventions = TRUE,
  onlyObservationalData = FALSE,
  indexObservationalData = NULL,
  setOptions = list(),
  verbose = FALSE
)

Arguments

`X`	A (nxp)-data matrix with n observations of p variables.
`environment`	A vector of length n, where the entry for observation i is an index for the environment in which observation i took place (simplest case entries `1` for observational data and entries `2` for interventional data of unspecified type). Is required for methods `ICP`, `hiddenICP`, `backShift`.
`interventions`	A optional list of length n. The entry for observation i is a numeric vector that specifies the variables on which interventions happened for observation i (a scalar if an intervention happened on just one variable and `numeric(0)` if no intervention occured for this observation). Is used for method `gies` but will generate the vector `environment` if this is set to `NULL` (even though it might generate too many different environments for some data so a hand-picked vector `environment` is preferable). Is also used for `ICP` and `hiddenICP` to exclude interventions on the target variable of interest.
`EV`	A bound on the expected number of falsely selected edges.
`nodewise`	If `FALSE`, stability selection retains for each subsample the largest overall entries in the connectivity matrix. If `TRUE`, values are ordered row- and node-wise first and then the largest entries in each row and column are retained. Error control is valid (under exchangeability assumption) in both cases. The latter setting `TRUE` is perhaps more robust and is the default.
`threshold`	The empirical selection frequency in (0.5,1) under subsampling that needs to be surpassed for an edge to be selected.
`nsim`	The number of resamples for stability selection.
`sampleSettings`	The fraction of different environments to resample in each resampling (at least two different environments will be selected so the argument is without effect if there are just two different environments in total).
`sampleObservations`	The fraction of samples to resample in each environment.
`parentsOf`	The variables for which we would like to estimate the parents. Default are all variables.
`method`	A string that specfies the method to use. The methods `pc` (PC-algorithm), `LINGAM` (LINGAM), `arges` (Adaptively restricted greedy equivalence search), `ges` (Greedy equivalence search), `gies` (Greedy interventional equivalence search), `fci` (Fast causal inference) and `rfci` (Really fast causal inference) are imported from the package "pcalg" and are documented there in more detail, including the additional options that can be supplied via `setOptions`. The method `CAM` (Causal additive models) is documented in the package "CAM" and the methods `ICP` (Invariant causal prediction), `hiddenICP` (Invariant causal prediction with hidden variables) are from the package "InvariantCausalPrediction". The method `backShift` comes from the package "backShift". The method `mmhc` comes from the package "bnlearn". Finally, the methods `bivariateANM` and `bivariateCAM` are for now implemented internally but will hopefully be part of another package at some point in the near future.
`alpha`	The level at which tests are done. This leads to confidence intervals for `ICP` and `hiddenICP` and is used internally for `pc` and `rfci`.
`mode`	Output type - can be "raw", "parental" or "ancestral". If "raw" output is the output of the underlying method, without modifications. If "parental" output described parental relations; if "ancestral" output is casted to ancestral relations. #TODO explain further
`variableSelMat`	An optional logical matrix of dimension (pxp). An entry `TRUE` for entry (i,j) says that variable i should be considered as a potential parent for variable j and vice versa for `FALSE`. If the default value of `NULL` is used, all variables will be considered, but this can be very slow, especially for methods `pc`, `ges`, `gies`, `rfci` and `CAM`.
`excludeTargetInterventions`	When looking for parents of variable k in 1,...,p, set to `TRUE` if observations where an intervention on variable k occured should be excluded. Default is `TRUE`.
`onlyObservationalData`	If set to `TRUE`, only observational data is used. It will take the index in `environment` specified by `indexObservationalData`. If `environment` is `NULL`, all observations are used. Default is `FALSE`.
`indexObservationalData`	Index in `environment` that encodes observational data. Default is `1`.
`setOptions`	A list that can take method-specific options; see the individual documentations of the methods for more options and their possible values.
`verbose`	If `TRUE`, detailed output is provided.

Value

A sparse matrix, where a 0 entry in (j,k) corresponds to an estimate of 'no edge' j -> parentsOf[k]. Entries between 0 and 100 give the selection percentage of this edge over all resamples (set to 0 if below critical threshold) and all non-zero values are considered as selected edges.

Author(s)

Nicolai Meinshausen meinshausen@stat.math.ethz.ch, Christina Heinze-Deml heinzedeml@stat.math.ethz.ch

References

Stability selection (2010): N. Meinshausen and P. Buhlmann, Journal of the Royal Statistical Society: Series B, 72, 417-473