is.submodel {cna} | R Documentation |
Identify correctness-preserving submodel relations
Description
The function is.submodel
checks for each element of a vector of cna
solution formulas whether it is a submodel of a specified target model y
. If y
is the true model in an inverse search (i.e. the ground truth), is.submodel
identifies correct models in the cna
output (see Baumgartner and Thiem 2020, Baumgartner and Ambuehl 2020).
Usage
is.submodel(x, y, strict = FALSE)
identical.model(x, y)
Arguments
x |
Character vector of atomic and/or complex solution formulas (asf/csf). Must be of length 1 in |
y |
Character string of length 1 specifying the target asf or csf. |
strict |
Logical; if |
Details
To benchmark the reliability of a method of causal inference it must be tested to what degree the method recovers the true data generating structure \Delta
or proper substructures of \Delta
from data of varying quality. Reliability benchmarking is done in so-called inverse searches, which reverse the order of causal discovery as normally conducted in scientific practice. An inverse search comprises three steps: (1) a causal structure \Delta
is drawn/presupposed (as ground truth), (2) artificial data \delta
is simulated from \Delta
, possibly featuring various deficiencies (e.g. noise, fragmentation, measurement error etc.), and (3) \delta
is processed by the benchmarked method in order to check whether its output meets the tested reliability benchmark (e.g. whether the output is true of or identical to \Delta
).
The main purpose of is.submodel
is to execute step (3) of an inverse search that is tailor-made to test the reliability of cna
[with randomConds
and selectCases
designed for steps (1) and (2), respectively]. A solution formula x
being a submodel of a target formula y
means that all the causal claims entailed by x
are true of y
, which is the case if a causal interpretation of x
entails conjunctive and disjunctive causal relevance relations that are all likewise entailed by a causal interpretation of y
. More specifically, x
is a submodel of y
if, and only if, the following conditions are satisfied: (i) all factor values causally relevant according to x
are also causally relevant according to y
, (ii) all factor values contained in two different disjuncts in x
are also contained in two different disjuncts in y
, (iii) all factor values contained in the same conjunct in x
are also contained in the same conjunct in y
, and (iv) if x
is a csf with more than one asf, (i) to (iii) are satisfied for all asfs in x
. For more details see Baumgartner and Thiem (2020) or Baumgartner and Ambuehl (2020, online appendix).
is.submodel
requires two inputs x
and y
, where x
is a character vector of cna
solution formulas (asf or csf) and y
is one asf or csf (i.e. a character string of length 1), viz. the target structure or ground truth. The function returns TRUE
for elements of x
that are a submodel of y
according to the definition of submodel-hood given in the previous paragraph. If strict = TRUE
, x
counts as a submodel of y
only if x
is a proper part of y
(i.e. x
is not identical to y
).
The function identical.model
returns TRUE
only if x
(which must be of length 1) and y
are identical. It can be used to test whether y
is completely recovered in an inverse search.
Value
Logical vector of the same length as x
.
References
Baumgartner, Michael and Mathias Ambuehl. 2020. “Causal Modeling with Multi-Value and Fuzzy-Set Coincidence Analysis.” Political Science Research and Methods. 8:526–542.
Baumgartner, Michael and Alrik Thiem. 2020. “Often Trusted But Never (Properly) Tested: Evaluating Qualitative Comparative Analysis”. Sociological Methods & Research 49:279-311.
See Also
randomConds
, selectCases
, cna
.
Examples
# Binary expressions
# ------------------
trueModel.1 <- "(A*b + a*B <-> C)*(C*d + c*D <-> E)"
candidates.1 <- c("(A + B <-> C)*(C + c*D <-> E)", "A + B <-> C",
"(A <-> C)*(C <-> E)", "C <-> E")
candidates.2 <- c("(A*B + a*b <-> C)*(C*d + c*D <-> E)", "A*b*D + a*B <-> C",
"(A*b + a*B <-> C)*(C*A*D <-> E)", "D <-> C",
"(A*b + a*B + E <-> C)*(C*d + c*D <-> E)")
is.submodel(candidates.1, trueModel.1)
is.submodel(candidates.2, trueModel.1)
is.submodel(c(candidates.1, candidates.2), trueModel.1)
is.submodel("C + b*A <-> D", "A*b + C <-> D")
is.submodel("C + b*A <-> D", "A*b + C <-> D", strict = TRUE)
identical.model("C + b*A <-> D", "A*b + C <-> D")
target.1 <- "(A*b + a*B <-> C)*(C*d + c*D <-> E)"
testformula.1 <- "(A*b + a*B <-> C)*(C*d + c*D <-> E)*(A + B <-> C)"
is.submodel(testformula.1, target.1)
# Multi-value expressions
# -----------------------
trueModel.2 <- "(A=1*B=2 + B=3*A=2 <-> C=3)*(C=1 + D=3 <-> E=2)"
is.submodel("(A=1*B=2 + B=3 <-> C=3)*(D=3 <-> E=2)", trueModel.2)
is.submodel("(A=1*B=1 + B=3 <-> C=3)*(D=3 <-> E=2)", trueModel.2)
is.submodel(trueModel.2, trueModel.2)
is.submodel(trueModel.2, trueModel.2, strict = TRUE)
target.2 <- "C=2*D=1*B=3 + A=1 <-> E=5"
testformula.2 <- c("C=2 + D=1 <-> E=5","C=2 + D=1*B=3 <-> E=5","A=1+B=3*D=1*C=2 <-> E=5",
"C=2 + D=1*B=3 + A=1 <-> E=5","C=2*B=3 + D=1 + B=3 + A=1 <-> E=5")
is.submodel(testformula.2, target.2)
identical.model(testformula.2[3], target.2)
identical.model(testformula.2[1], target.2)