R: Find atomic solution formulas with optimal consistency and...

cnaOpt {cnaOpt}

R Documentation

Find atomic solution formulas with optimal consistency and coverage

Description

cnaOpt attempts to find atomic solution formulas (asfs) for a given outcome (inferred from crisp-set, "cs", or multi-value, "mv", data) that are optimal with respect to the model fit parameters consistency and coverage (cf. Baumgartner and Ambuehl 2021).

Usage

cnaOpt(x, outcome, ..., reduce = c("ereduce", "rreduce", "none"), 
       niter = 1, crit = quote(con * cov), cond = quote(TRUE), 
			 approx = FALSE, maxCombs = 1e7)

Arguments

`x`	A `data.frame` or `configTable` of type `"cs"` or `"mv"`.
`outcome`	A character string specifying one outcome, i.e. one factor value in `x`.
`...`	Additional arguments passed to `configTable`, for instance `rm.dup.factors`, `rm.dup.factors`, or `case.cutoff`.
`reduce`	A character string: if `"ereduce"` or `"rreduce"`, the canonical DNF realizing the con-cov optimum is freed of redundancies using `ereduce` or `rreduce` (possibly repeatedly, see `niter`), respectively; if `"none"`, the unreduced canonical DNF is returned. `reduce = TRUE` is interpreted as `"rreduce"`, `reduce = FALSE` and `reduce = NULL` as `"none"`.
`niter`	An integer value indicating the number of repetitive applications of `rreduce`. `niter` will be ignored (with a warning) if `reduce` is not equal to `"rreduce"`. Note that repeated applications may yield identical solutions and that duplicate solutions are eliminiated, so that the number of resulting solutions can be smaller than `niter`.
`crit`	Quoted expression specifying a numeric criterion to be maximized when selecting the best solutions among the ones that meet criterion `cond`, for example, `quote(min(con,cov))` or `quote(0.8con + 0.2cov)`, etc.
`cond`	Quoted expression specifying a logical criterion to be imposed on the solutions inferred from `x` before selecting the best solutions on the basis of `crit`, for example, `quote(con > 0.85)` or `quote(con > cov)`, etc.
`approx`	As in `conCovOpt`.
`maxCombs`	Maximal number of combinations that will be tested for optimality. If the number of necessary iterations exceeds `maxCombs`, `cnaOpt` will stop executing and return an error message stating the necessary number of iterations. Early termination can then be avoided by increasing `maxCombs` accordingly. This argument is passed to `conCovOpt` and `ereduce`.

Details

cnaOpt implements a procedure introduced in Baumgartner and Ambuehl (2021). It infers causal models (atomic solution formulas, asf) for the outcome from data x that comply with the logical condition cond and maximize the numeric criterion crit. Data x may be crisp-set ("cs") or multi-value ("mv"), but not fuzzy-set ("fs"). The function proceeds as follows:

it calculates consistency and coverage optima (con-cov optima) for x;
it selects the optima that meet cond;
among those optima, it selects those that maximize crit;
it builds the canonical disjunctive normal forms (DNF) of the selected optima
it generates all minimal forms of those canonical DNFs

Roughly speaking, running cnaOpt amounts to sequentially executing configTable, conCovOpt, selectMax, DNFbuild and condTbl.

In the default setting, cnaOpt attempts to build all optimal solutions using ereduce. But that may be too computationally demanding because the space of optimal solutions can be very large. If the argument reduce is set to "rreduce", cnaOpt builds one arbitrarily selected optimal solution, which typically terminates quickly. By giving the argument niter a non-default value, say, 20, the process of selecting one optimal solution under reduce = "rreduce" is repeated 20 times. As the same solutions will be generated on some iterations and duplicates are not returned, the output may contain less models than the value given to niter. If reduce is not set to "rreduce", niter is ignored with a warning.

Value

cnaOpt returns a data.frame with additional classes "cnaOpt" and "condTbl". See the "Value" section in ?condTbl for details.

References

Baumgartner, Michael and Mathias Ambuehl. 2021. “Optimizing Consistency and Coverage in Configurational Causal Modeling.” Sociological Methods & Research.
doi:10.1177/0049124121995554.

Examples

# Example 1: Real-life crisp-set data, d.educate.
(res_opt1 <- cnaOpt(d.educate, "E"))

# Using the pipe operator (%>%), the steps processed by cnaOpt in the 
# call above can be reproduced as follows:
library(dplyr)
conCovOpt(d.educate, "E") %>% selectMax %>% DNFbuild(reduce = "ereduce") %>% 
  paste("<-> E") %>% condTbl(d.educate)

# Example 2: Simulated crisp-set data.
dat1 <- data.frame(
  A = c(1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0), 
  B = c(0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0), 
  C = c(0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0), 
  D = c(1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1), 
  E = c(1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1), 
  F = c(0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1)
)

(res_opt2 <- cnaOpt(dat1, "E"))

# Change the maximality criterion.
cnaOpt(dat1, "E", crit = quote(min(con, cov)))
# Change the selection condition.
cnaOpt(dat1, "E", cond = quote(con >= 0.9))
# Build all con-cov optima with coverage above 0.9 that maximize min(con, cov).
cnaOpt(dat1, "E", crit = quote(min(con, cov)), cond = quote(cov > 0.9))
# Different values of the reduce argument.
cnaOpt(dat1, "E", reduce = "none") # canonical DNF
cnaOpt(dat1, "E", reduce = "rreduce") # one randomly drawn optimal solution
# Iterate random solution generation 10 times.
cnaOpt(dat1, "E", reduce = "rreduce", niter = 10) 

# Example 3: All logically possible configurations.
(res_opt3 <- cnaOpt(full.ct(4), "D"))  # All combinations are equally bad.

# Example 4: Real-life multi-value data, d.pban.
cnaOpt(d.pban, outcome = "PB=1")
cnaOpt(d.pban, outcome = "PB=1", crit = quote(0.8*con + 0.2*cov))
cnaOpt(d.pban, outcome = "PB=1", cond = quote(con > 0.9))
cnaOpt(d.pban, outcome = "PB=0")
cnaOpt(d.pban, outcome = "PB=0", cond = quote(con > 0.9))
cnaOpt(d.pban, outcome = "F=2")
cnaOpt(d.pban, outcome = "F=2", crit = quote(0.8*con + 0.2*cov))

# Example 5: High computational demand.
dat2 <- configTable(d.performance[,1:8], frequency = d.performance$frequency)
try(cnaOpt(dat2, outcome = "SP"))   # error because too computationally demanding
# The following call does not terminate because of reduce = "ereduce".
try(cnaOpt(dat2, outcome = "SP", approx = TRUE))
# We could increase maxCombs, as in the line below
## Not run: cnaOpt(dat2, outcome = "SP", approx = TRUE, maxCombs = 1.08e+09) 
# but this takes very long to terminate.
# Alternative approach: Produce one (randomly selected) optimal solution using reduce = "rreduce".
cnaOpt(dat2, outcome = "SP",  approx = TRUE, reduce = "rreduce")
# Iterate the previous call 10 times.
cnaOpt(dat2, outcome = "SP", approx = TRUE, reduce = "rreduce", niter = 10)
# Another alternative: Use ereduce for minimization but introduce a case.cutoff.
cnaOpt(dat2, outcome = "SP", case.cutoff = 10)

[Package cnaOpt version 0.5.2 Index]