R: Select the cases/configurations compatible with a data...

selectCases {cna}

R Documentation

Select the cases/configurations compatible with a data generating causal structure

Description

selectCases selects the cases/configurations that are compatible with a Boolean function, in particular (but not exclusively), a data generating causal structure, from a data frame or configTable.

selectCases1 allows for setting consistency (con) and coverage (cov) thresholds. It then selects cases/configurations that are compatible with the data generating structure to degrees con and cov.

Usage

selectCases(cond, x = full.ct(cond), type = "auto", cutoff = 0.5, 
            rm.dup.factors = FALSE, rm.const.factors = FALSE)
selectCases1(cond, x = full.ct(cond), type = "auto", con = 1, cov = 1, 
             rm.dup.factors = FALSE, rm.const.factors = FALSE)

Arguments

`cond`	Character string specifying the Boolean function for which compatible cases are to be selected.
`x`	Data frame or `configTable`; if not specified, `full.ct(cond)` is used.
`type`	Character vector specifying the type of `x`: `"auto"` (automatic detection; default), `"cs"` (crisp-set), `"mv"` (multi-value), or `"fs"` (fuzzy-set).
`cutoff`	Cutoff value in case of `"fs"` data.
`rm.dup.factors`	Logical; if `TRUE`, all but the first of a set of factors with identical value distributions are eliminated.
`rm.const.factors`	Logical; if `TRUE`, constant factors are eliminated.
`con`, `cov`	Numeric scalars between 0 and 1 to set the minimum consistency and coverage thresholds.

Details

In combination with allCombs, full.ct, randomConds and makeFuzzy, selectCases is useful for simulating data, which are needed for inverse search trials benchmarking the output of the cna function.

selectCases draws those cases/configurations from a data frame or configTable x that are compatible with a data generating causal structure (or any other Boolean or set-theoretic function), which is given to selectCases as a character string cond. If the argument x is not specified, configurations are drawn from full.ct(cond). cond can be a condition of any of the three types of conditions, boolean, atomic or complex (see condition). To illustrate, if the data generating structure is "A + B <-> C", then a case featuring A=1, B=0, and C=1 is selected by selectCases, whereas a case featuring A=1, B=0, and C=0 is not (because according to the data generating structure, A=1 must be associated with C=1, which is violated in the latter case). The type of the data frame is automatically detected by default, but can be manually specified by giving the argument type one of its non-default values: "cs" (crisp-set), "mv" (multi-value), and "fs" (fuzzy-set).

selectCases1 allows for providing consistency (con) and coverage (cov) thresholds, such that some cases that are incompatible with cond are also drawn, as long as con and cov remain satisfied. The solution is identified by an algorithm aiming to find a subset of maximal size meeting the con and cov requirements. In contrast to selectCases, selectCases1 only accepts a condition of type atomic as its cond argument, i.e. an atomic solution formula. Data drawn by selectCases1 can only be modeled with consistency = con and coverage = cov.

Value

A configTable.

Examples

# Generate all configurations of 5 dichotomous factors that are compatible with the causal
# chain (A*b + a*B <-> C) * (C*d + c*D <-> E).
groundTruth.1 <- "(A*b + a*B <-> C) * (C*d + c*D <-> E)"
(dat1 <- selectCases(groundTruth.1))
condition(groundTruth.1, dat1)

# Randomly draw a multi-value ground truth and generate all configurations compatible with it.
dat1 <- allCombs(c(3, 3, 4, 4, 3))
groundTruth.2 <- randomCsf(dat1, n.asf=2)
(dat2 <- selectCases(groundTruth.2, dat1))
condition(groundTruth.2, dat2)

# Generate all configurations of 5 fuzzy-set factors compatible with the causal structure
# A*b + C*D <-> E, such that con = .8 and cov = .8.
dat1 <- allCombs(c(2, 2, 2, 2, 2)) - 1
dat2 <- makeFuzzy(dat1, fuzzvalues = seq(0, 0.45, 0.01))
(dat3 <- selectCases1("A*b + C*D <-> E", con = .8, cov = .8, dat2))
condition("A*b + C*D <-> E", dat3)

# Inverse search for the data generating causal structure A*b + a*B + C*D <-> E from
# fuzzy-set data with non-perfect consistency and coverage scores.
dat1 <- allCombs(c(2, 2, 2, 2, 2)) - 1
set.seed(7)
dat2 <- makeFuzzy(dat1, fuzzvalues = 0:4/10)
dat3 <- selectCases1("A*b + a*B + C*D <-> E", con = .8, cov = .8, dat2)
cna(dat3, outcome = "E", con = .8, cov = .8)

# Draw cases satisfying specific conditions from real-life fuzzy-set data.
ct.js <- configTable(d.jobsecurity)
selectCases("S -> C", ct.js)  # Cases with higher membership scores in C than in S.
selectCases("S -> C", d.jobsecurity)  # Same.
selectCases("S <-> C", ct.js) # Cases with identical membership scores in C and in S.
selectCases1("S -> C", con = .8, cov = .8, ct.js)  # selectCases1() makes no distinction 
              #  between "->" and "<->".
condition("S -> C", selectCases1("S -> C", con = .8, cov = .8, ct.js))

# selectCases() not only draws cases compatible with Boolean causal models. Any Boolean 
# function of factor values appearing in the data can be given as cond.
selectCases("C=1*B=3", allCombs(2:4))
selectCases("A=1 * !(C=2 + B=3)", allCombs(2:4), type = "mv")
selectCases("A=1 + (C=3 <-> B=1)*D=3", allCombs(c(3,3,3,3)), type = "mv")

[Package cna version 3.6.2 Index]