R: Estimate the connectivity matrix of a causal graph

getParents {CompareCausalNetworks}

R Documentation

Estimate the connectivity matrix of a causal graph

Description

Estimates the connectivity matrix of a directed causal graph, using various possible methods. Supported methods at the moment are ARGES, backShift, bivariateANM, bivariateCAM, CAM, FCI, FCI+, GES, GIES, hiddenICP, ICP, LINGAM, MMHC, rankARGES, rankFci, rankGES, rankGIES, rankPC, regression, RFCI and PC.

Usage

getParents(
  X,
  environment = NULL,
  interventions = NULL,
  parentsOf = 1:ncol(X),
  method = c("arges", "backShift", "bivariateANM", "bivariateCAM", "CAM", "fci",
    "fciplus", "ges", "gies", "hiddenICP", "ICP", "LINGAM", "mmhc", "rankArges",
    "rankFci", "rankGes", "rankGies", "rankPc", "rfci", "pc", "regression")[12],
  alpha = 0.1,
  mode = c("raw", "parental", "ancestral")[1],
  variableSelMat = NULL,
  excludeTargetInterventions = TRUE,
  onlyObservationalData = FALSE,
  indexObservationalData = 1,
  returnAsList = FALSE,
  sparse = FALSE,
  directed = FALSE,
  pointConf = FALSE,
  setOptions = list(),
  assumeNoSelectionVars = TRUE,
  verbose = FALSE,
  ...
)

Arguments

`X`	A `(n` x `p)`-data matrix with n observations of `p` variables.
`environment`	An optional vector of length `n`, where the entry for observation `i` is an index for the environment in which observation `i` took place (Simplest case: entries `1` for observational data and entries `2` for interventional data of unspecified type. Encoding for observational data can be changed with `indexObservationalData`). Is required for methods `ICP`, `hiddenICP` and `backShift`.
`interventions`	A optional list of length `n`. The entry for observation `i` is a numeric vector that specifies the variables on which interventions happened for observation `i` (a scalar if an intervention happened on just one variable and `integer(0)` if no intervention occured for this observation). Is used for methods `gies`, `rankGies` and `CAM` and will generate the vector `environment` if the latter is set to `NULL`. (However, this might generate too many different environments for some data sets, so a hand-picked vector `environment` is preferable). Is also used for `ICP` and `hiddenICP` to exclude interventions on the target variable of interest.
`parentsOf`	The variables for which we would like to estimate the parents. Default are all variables. Currently only used with `mode = "raw"`. Speeds up computation for methods `bivariateANM`, `bivariateCAM`, `ICP`, `hiddenICP` and `regression`; for other methods only affects output. Also see `variableSelMat` for possibly speeding up computational time by restricting the set of potential parents for a variable.
`method`	A string that specfies the method to use. The methods `pc` (PC-algorithm), `LINGAM` (LINGAM), `arges` (Adaptively restricted greedy equivalence search), `ges` (Greedy equivalence search), `gies` (Greedy interventional equivalence search), `fci` (Fast causal inference), `fciplus` and `rfci` (Really fast causal inference) are imported from the package "pcalg" and are documented there in more detail, including the additional options that can be supplied via `setOptions`. The "rank versions" of arges, fci, ges, gies and pc are based on [1]. The method `CAM` (Causal additive models) is documented in the package "CAM" and the methods `ICP` (Invariant causal prediction), `hiddenICP` (Invariant causal prediction with hidden variables) are from the package "InvariantCausalPrediction". The method `backShift` comes from the package "backShift". The method `mmhc` comes from the package "bnlearn". Finally, the methods `bivariateANM` and `bivariateCAM` are for now implemented internally but will hopefully be part of another package at some point in the near future.
`alpha`	The level at which tests are done. This leads to confidence intervals for `ICP` and `hiddenICP` and is used internally for `pc`, `rankPc`, `mmhc`, `fci`, `rankFci`, `fciplus` and `rfci`. For all other methods `alpha` is not used.
`mode`	Determines output type - can be "raw" or one of the queries "isParent", "isMaybeParent", "isNoParent", "isAncestor","isMaybeAncestor", "isNoAncestor". If "raw", `getParents()` returns the connectivity matrix computed by the specified method in sparse matrix format if `sparse` is set to `TRUE`; else in dense matrix format (or as list if `returnAsList = TRUE`). The options `directed` and `pointConf` will be ignored for all modes except for "raw" if set to `TRUE`. The different mode types are explained in the help for `getRanking`.
`variableSelMat`	An optional logical matrix of dimension `(p` x `p)`. An entry `TRUE` for entry `(i,j)` says that variable `i` should be considered as a potential parent for variable `j` and vice versa for `FALSE`. If the default value of `NULL` is used, all variables will be considered, but this can be very slow, especially for methods `pc`, `ges`, `gies`, `rfci` and `CAM`. Ignored for methods `backShift`, `fciplus`, `LINGAM` and `CAM`.
`excludeTargetInterventions`	When looking for parents of variable `k` in `1,...,p`, set to `TRUE` if observations where an intervention on variable `k` occured should be excluded. Default is `TRUE`. Used in `ICP` and `hiddenICP`.
`onlyObservationalData`	If set to `TRUE`, only observational data is used. It will take the index in `environment` specified by `indexObservationalData`. If `environment` is `NULL`, all observations are used. Default is `FALSE`.
`indexObservationalData`	Index in `environment` that encodes observational data. Default is `1`.
`returnAsList`	If set to `TRUE`, will return a list, where entry `k` is a list containing the estimated parents of variable `k`. Default is `FALSE`.
`sparse`	If set to `TRUE` and `returnAsList` is `FALSE`, output matrix will be in sparse matrix format.
`directed`	If `TRUE`, an edge will be returned if and only if an edge has been detected to be directed. I.e. entry will be set to 0 for entry `(j,k)` if both `j -> k` and `k -> j` are estimated (`ICP`, `hiddenICP`, `regression`), if `j -- k` is undirected (in case of CPDAGs) or if the edge type is not of type `i --> j` in case of PAGs. If `assumeNoSelectionVars = TRUE` the edge type `i --o j` is also considered 'directed' for methods returning PAGs. Default is `FALSE`. Only supported in mode "raw".
`pointConf`	If `TRUE`, numerical estimates will be returned if possible. For methods `ICP` and `hiddenICP`, these are the values in the individual confidence intervals (at chosen level `alpha`) that are closest to 0; for other methods these are point estimates. Some methods do not return numerical point estimates; for these the output will remain binary 0/1 (no-edge/edge). Default is `FALSE`. Only supported in mode "raw".
`setOptions`	A list that can take method-specific options; see the individual documentations of the methods for more options and their possible values.
`assumeNoSelectionVars`	Set to `TRUE` is you want to assume the absence of selection variables.
`verbose`	If `TRUE`, detailed output is provided.
`...`	Parameters to be passed to underlying method's function.

Value

If option returnAsList is FALSE, a sparse matrix, where a 0 entry in position (j,k) corresponds to an estimate of "no edge" j -> k, while an entry 1 corresponds to an estimated egde. If option pointConf is TRUE, the 1 entries will be replaced by numerical values that are either point estimates of the causal coefficients or confidence bounds (see above). If option returnAsList is TRUE, a list will be returned. The k-th entry in the list is the numeric vector with the indices of the estimated parents of node k.

Author(s)

Christina Heinze-Deml heinzedeml@stat.math.ethz.ch, Nicolai Meinshausen meinshausen@stat.math.ethz.ch

References

Naftali Harris and Mathias Drton: PC Algorithm for Nonparanormal Graphical Models. J. Mach. Learn. Res. 14(1) 2013.

Examples

## load the backShift package for data generation and plotting functionality
if(require(backShift) & require(pcalg)){
  # Simulate data with connectivity matrix A with assumptions
  # 1) hidden variables present
  # 2) precise location of interventions is assumed unknown
  # 3) different environments can be distinguished
  
  ## simulate data
  myseed <- 1
  
  # sample size n
  n <- 10000
  
  # p=3 predictor variables and connectivity matrix A
  p <- 3
  labels <- c("1", "2", "3")
  A <- diag(p)*0
  A[1,2] <- 0.8
  A[2,3] <- 0.8
  A[3,1] <- -0.4
  
  # divide data in 10 different environments
  G <- 10
  
  # simulate
  simResult <- backShift::simulateInterventions(n, p, A, G, intervMultiplier = 3,
               noiseMult = 1, nonGauss = TRUE, hiddenVars = TRUE,
               knownInterventions = FALSE, fracVarInt = NULL, simulateObs = TRUE,
               seed = myseed)
  X <- simResult$X
  environment <- simResult$environment
  
  ## apply all  methods given in vector 'methods'
  ## (using all data pooled for pc/LINGAM/rfci/ges -- can be changed with option
  ## 'onlyObservationalData=TRUE')
  
  methods <- c("backShift", "LINGAM") #c("pc", "rfci", "ges")
  
  # select whether you want to run stability selection
  stability <- FALSE
  
  # arrange graphical output into a rectangular grid
  sq <- ceiling(sqrt(length(methods)+1))
  par(mfrow=c(ceiling((length(methods)+1)/sq),sq))
  
  ## plot and print true graph
  cat("\n true graph is  ------  \n" )
  print(A)
  plotGraphEdgeAttr(A, plotStabSelec = FALSE, labels = labels, thres.point = 0,
   main = "TRUE GRAPH")
  
  ## loop over all methods and compute and print/plot estimate
  for (method in methods){
    cat("\n result for method", method,"  ------  \n" )
  
    if(!stability){
      # Option 1): use this estimator as a point estimate
      Ahat <- getParents(X, environment, method=method, alpha=0.1, pointConf = TRUE)
    }else{
      # Option 2): use a stability selection based estimator
      # with expected number of false positives bounded by EV=2
      Ahat <- getParentsStable(X, environment, EV=2, method=method, alpha=0.1)
    }
  
   # print and plot estimate (point estimate thresholded if numerical estimates
   # are returned)
    print(Ahat)
    if(!stability)
      plotGraphEdgeAttr(Ahat, plotStabSelec = FALSE, labels = labels,
       thres.point = 0.05,
       main=paste("POINT ESTIMATE FOR METHOD\n", toupper(method)))
    else
      plotGraphEdgeAttr(Ahat, plotStabSelec = TRUE, labels = labels,
       thres.point = 0, main = paste("STABILITY SELECTION
       ESTIMATE\n FOR METHOD", toupper(method)))
   }
}else{
    cat("\nThe packages 'backShift' and 'pcalg' are needed for the examples to
work. Please install them.")
}

[Package CompareCausalNetworks version 0.2.6.2 Index]