cnaOpt {cnaOpt} | R Documentation |
Find atomic solution formulas with optimal consistency and coverage
Description
cnaOpt
attempts to find atomic solution formulas (asfs) for a given outcome
(inferred from crisp-set, "cs"
, or multi-value, "mv"
, data) that are optimal with respect to the model fit parameters consistency and coverage (cf. Baumgartner and Ambuehl 2021).
Usage
cnaOpt(x, outcome, ..., reduce = c("ereduce", "rreduce", "none"),
niter = 1, crit = quote(con * cov), cond = quote(TRUE),
approx = FALSE, maxCombs = 1e7)
Arguments
x |
A |
outcome |
A character string specifying one outcome, i.e. one factor value in |
... |
Additional arguments passed to |
reduce |
A character string: if |
niter |
An integer value indicating the number of repetitive applications of |
crit |
Quoted expression specifying a numeric criterion to be maximized when selecting the best solutions among the ones that meet criterion |
cond |
Quoted expression specifying a logical criterion to be imposed on the solutions inferred from |
approx |
As in |
maxCombs |
Maximal number of combinations that will be tested for optimality. If the number of necessary iterations exceeds |
Details
cnaOpt
implements a procedure introduced in Baumgartner and Ambuehl (2021). It infers causal models (atomic solution formulas, asf) for the outcome
from data x
that comply with the logical condition cond
and maximize the numeric criterion crit
. Data x
may be crisp-set ("cs"
) or multi-value ("mv"
), but not fuzzy-set ("fs"
). The function proceeds as follows:
it calculates consistency and coverage optima (con-cov optima) for
x
;it selects the optima that meet
cond
;among those optima, it selects those that maximize
crit
;it builds the canonical disjunctive normal forms (DNF) of the selected optima
it generates all minimal forms of those canonical DNFs
Roughly speaking, running cnaOpt
amounts to sequentially executing configTable
, conCovOpt
, selectMax
, DNFbuild
and condTbl
.
In the default setting, cnaOpt
attempts to build all optimal solutions using ereduce
. But that may be too computationally demanding because the space of optimal solutions can be very large. If the argument reduce
is set to "rreduce"
, cnaOpt
builds one arbitrarily selected optimal solution, which typically terminates quickly. By giving the argument niter
a non-default value, say, 20, the process of selecting one optimal solution under reduce = "rreduce"
is repeated 20 times. As the same solutions will be generated on some iterations and duplicates are not returned, the output may contain less models than the value given to niter
. If reduce
is not set to "rreduce"
, niter
is ignored with a warning.
Value
cnaOpt
returns a data.frame
with additional classes "cnaOpt" and "condTbl". See the "Value" section in ?condTbl
for details.
References
Baumgartner, Michael and Mathias Ambuehl. 2021. “Optimizing Consistency and Coverage in Configurational Causal Modeling.” Sociological Methods & Research.
doi:10.1177/0049124121995554.
See Also
Examples
# Example 1: Real-life crisp-set data, d.educate.
(res_opt1 <- cnaOpt(d.educate, "E"))
# Using the pipe operator (%>%), the steps processed by cnaOpt in the
# call above can be reproduced as follows:
library(dplyr)
conCovOpt(d.educate, "E") %>% selectMax %>% DNFbuild(reduce = "ereduce") %>%
paste("<-> E") %>% condTbl(d.educate)
# Example 2: Simulated crisp-set data.
dat1 <- data.frame(
A = c(1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0),
B = c(0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0),
C = c(0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0),
D = c(1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1),
E = c(1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1),
F = c(0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1)
)
(res_opt2 <- cnaOpt(dat1, "E"))
# Change the maximality criterion.
cnaOpt(dat1, "E", crit = quote(min(con, cov)))
# Change the selection condition.
cnaOpt(dat1, "E", cond = quote(con >= 0.9))
# Build all con-cov optima with coverage above 0.9 that maximize min(con, cov).
cnaOpt(dat1, "E", crit = quote(min(con, cov)), cond = quote(cov > 0.9))
# Different values of the reduce argument.
cnaOpt(dat1, "E", reduce = "none") # canonical DNF
cnaOpt(dat1, "E", reduce = "rreduce") # one randomly drawn optimal solution
# Iterate random solution generation 10 times.
cnaOpt(dat1, "E", reduce = "rreduce", niter = 10)
# Example 3: All logically possible configurations.
(res_opt3 <- cnaOpt(full.ct(4), "D")) # All combinations are equally bad.
# Example 4: Real-life multi-value data, d.pban.
cnaOpt(d.pban, outcome = "PB=1")
cnaOpt(d.pban, outcome = "PB=1", crit = quote(0.8*con + 0.2*cov))
cnaOpt(d.pban, outcome = "PB=1", cond = quote(con > 0.9))
cnaOpt(d.pban, outcome = "PB=0")
cnaOpt(d.pban, outcome = "PB=0", cond = quote(con > 0.9))
cnaOpt(d.pban, outcome = "F=2")
cnaOpt(d.pban, outcome = "F=2", crit = quote(0.8*con + 0.2*cov))
# Example 5: High computational demand.
dat2 <- configTable(d.performance[,1:8], frequency = d.performance$frequency)
try(cnaOpt(dat2, outcome = "SP")) # error because too computationally demanding
# The following call does not terminate because of reduce = "ereduce".
try(cnaOpt(dat2, outcome = "SP", approx = TRUE))
# We could increase maxCombs, as in the line below
## Not run: cnaOpt(dat2, outcome = "SP", approx = TRUE, maxCombs = 1.08e+09)
# but this takes very long to terminate.
# Alternative approach: Produce one (randomly selected) optimal solution using reduce = "rreduce".
cnaOpt(dat2, outcome = "SP", approx = TRUE, reduce = "rreduce")
# Iterate the previous call 10 times.
cnaOpt(dat2, outcome = "SP", approx = TRUE, reduce = "rreduce", niter = 10)
# Another alternative: Use ereduce for minimization but introduce a case.cutoff.
cnaOpt(dat2, outcome = "SP", case.cutoff = 10)