tpc {tpc} | R Documentation |
PC Algorithm Accounting for a Partial Node Ordering
Description
Like [pcalg::pc()], but takes into account a user-specified partial
ordering of the nodes/variables. This has two effects:
1) The conditional independence between x
and y
given S
is
ot tested if any variable in S
lies in the future of both x
and y
;
2) edges cannot be oriented from a higher-order to a lower-order node. In addition,
the user may specify individual forbidden edges and context variables.
Usage
tpc(
suffStat,
indepTest,
alpha,
labels,
p,
skel.method = c("stable", "stable.parallel"),
forbEdges = NULL,
m.max = Inf,
conservative = FALSE,
maj.rule = TRUE,
tiers = NULL,
context.all = NULL,
context.tier = NULL,
verbose = FALSE,
numCores = NULL,
cl.type = "PSOCK",
clusterexport = NULL
)
Arguments
suffStat |
A [base::list()] of sufficient statistics, containing all necessary elements for the conditional independence decisions in the function [indepTest()]. |
indepTest |
A function for testing conditional independence. It is internally
called as |
alpha |
significance level (number in (0,1) for the individual conditional independence tests. |
labels |
(optional) character vector of variable (or "node") names.
Typically preferred to specifying |
p |
(optional) number of variables (or nodes). May be specified if |
skel.method |
Character string specifying method; the default, "stable" provides an order-independent skeleton, see [tpc::tskeleton()]. |
forbEdges |
A logical matrix of dimension p*p. If |
m.max |
Maximal size of the conditioning sets that are considered in the conditional independence tests. |
conservative |
Logical indicating if conservative PC should be used. Defaults to FALSE. See [pcalg::pc()] for details. |
maj.rule |
Logical indicating if the majority rule should be used. Defaults to TRUE. See [pcalg::pc()] for details. |
tiers |
Numeric vector specifying the tier / time point for each variable. Must be of length 'p', if specified, or have the same length as 'labels', if specified. A smaller number corresponds to an earlier tier / time point. |
context.all |
Numeric or character vector. Specifies the positions or names of global context variables. Global context variables have no incoming edges, i.e. no parents, and are themselves parents of all non-context variables in the graph. |
context.tier |
Numeric or character vector. Specifies the positions or names of tier-specific context variables. Tier-specific context variables have no incoming edges, i.e. no parents, and are themselves parents of all non-context variables in the same tier. |
verbose |
if |
numCores |
The numbers of CPU cores to be used. |
cl.type |
The cluster type. Default value is |
clusterexport |
Character vector. Lists functions to be exported to nodes if numCores > 1. |
Details
See pcalg::pc
for further information on the PC algorithm.
The PC algorithm is named after its developers Peter Spirtes and Clark Glymour
(Spirtes et al., 2000).
Specifying a tier for each variable using the tier
argument has the
following effects:
1) In the skeleton phase and v-structure learing phases,
conditional independence testing is restricted such that if x is in tier t(x)
and y is in t(y), only those variables are allowed in the conditioning set whose
tier is not larger than t(x).
2) Following the v-structure phase, all
edges that were found between two tiers are directed into the direction of the
higher-order tier. If context variables are specified using context.all
and/or context.tier
, the corresponding orientations are added in this step.
Value
An object of class
"pcAlgo
"
(see [pcalg::pcalgo] containing an estimate of the equivalence class of
the underlying DAG.
Author(s)
Original code by Markus Kalisch, Martin Maechler, and Diego Colombo. Modifications by Janine Witte (Kalisch et al., 2012).
References
M. Kalisch, M. Maechler, D. Colombo, M.H. Maathuis and P. Buehlmann (2012). Causal Inference Using Graphical Models with the R Package pcalg. Journal of Statistical Software 47(11): 1–26.
P. Spirtes, C. Glymour and R. Scheines (2000). Causation, Prediction, and Search, 2nd edition. The MIT Press. https://philarchive.org/archive/SPICPA-2.
Examples
# load simulated cohort data
data(dat_sim)
n <- nrow(dat_sim)
lab <- colnames(dat_sim)
# estimate skeleton without taking background information into account
tpc.fit <- tpc(suffStat = list(C = cor(dat_sim), n = n),
indepTest = gaussCItest, alpha = 0.01, labels = lab)
pc.fit <- pcalg::pc(suffStat = list(C = cor(dat_sim), n = n),
indepTest = gaussCItest, alpha = 0.01, labels = lab,
maj.rule = TRUE, solve.conf = TRUE)
identical(pc.fit@graph, tpc.fit@graph) # TRUE
# estimate skeleton with temporal ordering as background information
tiers <- rep(c(1,2,3), times=c(3,3,3))
tpc.fit2 <- tpc(suffStat = list(C = cor(dat_sim), n = n),
indepTest = gaussCItest, alpha = 0.01, labels = lab, tiers = tiers)
tpc.fit3 <- tpc(suffStat = list(C = cor(dat_sim), n = n),
indepTest = gaussCItest, alpha = 0.01, labels = lab, tiers = tiers,
skel.method = "stable.parallel",
numCores = 2, clusterexport = c("cor", "ecdf"))
if(requireNamespace("Rgraphviz", quietly = TRUE)){
data("true_sim")
oldpar <- par(mfrow = c(1,3))
plot(true_sim, main = "True DAG")
plot(tpc.fit, main = "PC estimate")
plot(tpc.fit2, main = "tPC estimate")
par(oldpar)
}
# require that there is no edge between A1 and A1, and that any edge between A2 and B2
# or A2 and C2 is directed away from A2
forb <- matrix(FALSE, nrow=9, ncol=9)
rownames(forb) <- colnames(forb) <- lab
forb["A1","A3"] <- forb["A3","A1"] <- TRUE
forb["B2","A2"] <- TRUE
forb["C2","A2"] <- TRUE
tpc.fit3 <- tpc(suffStat = list(C = cor(dat_sim), n = n),
indepTest = gaussCItest, alpha = 0.01,labels = lab,
forbEdges = forb, tiers = tiers)
if (requireNamespace("Rgraphviz", quietly = TRUE)) {
# compare estimated CPDAGs
data("true_sim")
oldpar <- par(mfrow = c(1,2))
plot(tpc.fit2, main = "old tPC estimate")
plot(tpc.fit3, main = "new tPC estimate")
par(oldpar)
}
# force edge from A1 to all other nodes measured at time 1
# into the graph (note that the edge from A1 to A2 is then
# forbidden)
tpc.fit4 <- tpc(suffStat = list(C = cor(dat_sim), n = n),
indepTest = gaussCItest, alpha = 0.01, labels = lab,
tiers = tiers, context.tier = "A1")
if (requireNamespace("Rgraphviz", quietly = TRUE)) {
# compare estimated CPDAGs
data("true_sim")
plot(tpc.fit4, main = "alternative tPC estimate")
}
# force edge from A1 to all other nodes into the graph
tpc.fit5 <- tpc(suffStat = list(C = cor(dat_sim), n = n),
indepTest = gaussCItest, alpha = 0.01, labels = lab,
tiers = tiers, context.all = "A1")
if (requireNamespace("Rgraphviz", quietly = TRUE)) {
# compare estimated CPDAGs
data("true_sim")
plot(tpc.fit5, main = "alternative tPC estimate")
}