getRanking {CompareCausalNetworks} | R Documentation |
Estimate a ranking of edges for causal relations in the underlying graph structure using stability ranking.
Description
Estimates a ranking of edges for a given query, e.g. for parental relations in the underlying causal graph structure, using various possible methods.
Supported methods at the moment are ARGES, backShift, bivariateANM, bivariateCAM, CAM, FCI, FCI+, GES, GIES, hiddenICP, ICP, LINGAM, MMHC, rankARGES, rankFci, rankGES, rankGIES, rankPC, regression, RFCI and PC.
Usage
getRanking(
X,
environment,
interventions = NULL,
queries = c("isParent", "isMaybeParent", "isNoParent", "isAncestor",
"isMaybeAncestor", "isNoAncestor"),
method = c("ICP", "hiddenICP", "backShift", "pc", "LINGAM", "ges", "gies", "CAM",
"fci", "rfci", "regression", "bivariateANM", "bivariateCAM")[1],
alpha = 0.1,
variableSelMat = NULL,
excludeTargetInterventions = TRUE,
onlyObservationalData = FALSE,
indexObservationalData = NULL,
setOptions = list(),
assumeNoSelectionVars = TRUE,
nsim = 100,
sampleSettings = 1/sqrt(2),
sampleObservations = 1/sqrt(2),
verbose = FALSE,
...
)
Arguments
X |
A |
environment |
A vector of length |
interventions |
A optional list of length n. The entry for observation
i is a numeric vector that specifies the variables on which interventions
happened for observation i (a scalar if an intervention happened on just
one variable and |
queries |
One (or more of) "isParent", "isMaybeParent", "isNoParent", "isAncestor","isMaybeAncestor", "isNoAncestor" |
method |
A string that specfies the method to use. The methods
|
alpha |
The level at which tests are done. This leads to confidence
intervals for |
variableSelMat |
An optional logical matrix of dimension (pxp). An
entry |
excludeTargetInterventions |
When looking for parents of variable k
in 1,...,p, set to |
onlyObservationalData |
If set to |
indexObservationalData |
Index in |
setOptions |
A list that can take method-specific options; see the individual documentations of the methods for more options and their possible values. |
assumeNoSelectionVars |
Set to |
nsim |
The number of resamples for stability selection. |
sampleSettings |
The fraction of different environments to resample in each resampling (at least two different environments will be selected so the argument is without effect if there are just two different environments in total). |
sampleObservations |
The fraction of samples to resample in each environment. |
verbose |
If |
... |
Parameters to be passed to underlying method's function. |
Details
For both parental and ancestral relations, three queries are supported.
The existence of a relation is assessed by the queries isParent
and
isAncestor
; the absence of a relation is assessed by the queries
isNoParent
and isNoAncestor
; the potential existence of a
relation is addressed by the queries isMaybeParent
and
isMaybeAncestor
.
All queries return a connectivity matrix which we denote by A
.
The interpretation of the entries of A
differs according to the considered query:
Parental relations: Queries concerning parental relations can only be answered by those methods under consideration that return a DAG, a CPDAG or a directed cyclic graph. When we say that a particular method cannot answer a given query, then the method's output with respect to this query will be the zero matrix. However, the eventual ranking for such a query will not necessarily be random due to the tie breaking scheme that is applied when ranking pairs of variables (see below).
-
isParent
In the connectivity matrixA
returned by this query, the entryA_{i,j} = 1
means that there is a directed edge from nodei
to nodej
in the graph structure estimated by the method under consideration. Otherwise,A_{i,j} = 0
. -
isMaybeParent
A_{i,j} = 1
means that there is a directed or an undirected edge from nodei
to nodej
in the estimated graph structure. Otherwise,A_{i,j} = 0
. -
isNoParent
A_{i,j} = 1
means that there is neither a directed nor an undirected edge from nodei
to nodej
in the estimated graph structure. Otherwise,A_{i,j} = 0
.
Ancestral relations: Queries concerning ancestral relations can be answered by all methods under consideration.
-
isAncestor
A_{i,j} = 1
means that there is a directed path from nodei
to nodej
in the estimated graph structure. Otherwise,A_{i,j} = 0
. In case of PAGs, directed paths can contain the edge typesi --> j
andi --o j
. Including the latter edge type in this category implies that we exclude the existence of selection variables. -
isMaybeAncestor
A_{i,j} = 1
then means that there is a path from nodei
to nodej
that contains directed and/or undirected edges. Otherwise,A_{i,j} = 0
. For PAGs, such paths can contain the edge typesi --> j
,i --o j
,i o-o j
and/ori o-> j
. Otherwise,A_{i,j} = 0
. -
isNoAncestor
A_{i,j} = 1
means that there is neither a directed path nor a partially directed path from nodei
to nodej
in the estimated graph structure. Otherwise,A_{i,j} = 0
.
Stability ranking: To obtain a ranking of edges for a given set of
queries, we run the method under consideration on nsims
random
subsamples of the data. In each round, we draw samples from a fraction of
settings, where the size of the fraction is specified by sampleSettings
.
In each chosen setting, we sample a fraction of observations
uniformly at random without replacement, where the size of the fraction is
specified by sampleObservations
.
For each subsample we randomly permute the order of the variables in the input. Methods that are order-dependent can therefore not exploit any potential advantage stemming from a data matrix with columns ordered according to the causal ordering or a similar one. We then run the method on each subsample.
For each subsample and a particular query, we obtain the corresponding
connectivity matrix A
. We can then rank all pairs of nodes i,j
according to the frequency of the occurrence of A_{i,j} = 1
across
subsamples. Ties between pairs of variables can be broken with the results
of the other queries if they are also computed as specified by queries
;
otherwise ties are broken at random:
If the query is
isParent
, ties are broken with counts forisMaybeParent
.For the query
isMaybeParent
ties are broken with counts forisParent
, i.e. in case of equal counts we give a preference to the edge that was considered more often to be a 'certain' parent. For methods returning DAGs this scheme makes the ranking forisMaybeParent
equal to the result forisParent
, up to the random tie breaking that is applied forisParent
.If the query is
isNoParent
, ties are broken according to which edge was selected less often in the queryisMaybeParent
.If the query is
isAncestor
, ties are broken with counts forisMaybeAncestor
.For the query
isMaybeAncestor
ties are broken with counts forisAncestor
, i.e. in case of equal counts we give a preference to the edge that was considered more often to be a 'certain' ancestor. For methods returning DAGs this scheme makes the ranking forisMaybeAncestor
equal to the result forisAncestor
, up to the random tie breaking that is applied forisAncestor
.If the query is
isNoAncestor
, ties are broken according to which one was selected less often in the queryisMaybeAncestor
.
If the tie breaking matrix defined according to these rules is 0, a matrix with standard normal random entries is used to break ties. Similarly, if there are remaining ties after applying the tie breaking rules described above, ties are broken randomly.
Value
A list with the following entries:
-
ranking
A list of lengthlength(queries)
. For each query, the corresponding list entry contains a matrix of dimension(p x p) x 2
with the ranking of edges. E.g. the first row indicates that the edge from ranking$isParent[1,1] to ranking$isParent[1,2] is the most likely edge according to the method under consideration. -
resList
A list of lengthlength(queries)
. For each query, the corresponding list entry contains a matrix of dimension(p x p)
with the counts forA_{i,j} = 1
across thensim
subsamples. -
simEstimates
A list of lengthnsim
with the method's output for each of thensim
subsamples.
Author(s)
Christina Heinze-Deml heinzedeml@stat.math.ethz.ch
See Also
getParents
for the underlying point-estimate of
the causal graph.
Examples
data("simDataInv")
X <- simDataInv$X
set.seed(1)
if(require(pcalg)){
rank <- getRanking(X,
environment = simDataInv$environment,
queries = c("isParent","isMaybeParent"),
method = c("LINGAM"),
verbose = FALSE)
# estimated ranking
print(rank$ranking$isParent)
# true adjacency matrix
print(simDataInv$configs$trueA)
}else{
cat("\nThe packages 'pcalg' is needed for the example to
work. Please install it.")
}