R: frscore

frscore {frscore}

R Documentation

frscore

Description

Calculate fit-robustness scores for a set of cna solutions/models

Usage

frscore(
  sols,
  dat = NULL,
  scoretype = c("full", "supermodel", "submodel"),
  normalize = c("truemax", "idealmax", "none"),
  maxsols = 50,
  verbose = FALSE,
  print.all = FALSE,
  comp.method = c("causal_submodel", "is.submodel")
)

Arguments

`sols`	Character vector of class "stdAtomic" or "stdComplex" (as generated by `cna()`) that contains the solutions/models to be scored.
`dat`	A `configTable`, a data frame, a matrix, or a list that specifies the range of admissible factor values for the factors featured in the models included in `sols`. Only needed when the models in `sols` are multi-valued, otherwise ignored.
`scoretype`	String specifying the scoring method: `"full"` (default; scoring is based on counting sub- and supermodel relations), `"supermodel"` (count supermodels only), `"submodel"` (count submodels only). Allowed for backward compatibility only, due to be dropped in next version.
`normalize`	String that determines the method used in normalizing the scores. `"truemax"` (default) normalizes by the highest score among the elements of `sols`, such that the highest scoring solution types get score 1. `"idealmax"` normalizes by a theoretical maximum score (see Details).
`maxsols`	Integer determining the maximum number of unique solution types found in `sols` to be included in the scoring (see Details).
`verbose`	Logical; if `TRUE`, additional information about causal compatibility relations among the unique solution types found in `sols` is printed. Defaults to `FALSE`.
`print.all`	Logical, controls the number of entries printed when printing the results. If `TRUE`, results are printed as when using the defaults of `print.data.frame`. If `FALSE`, 20 highest scoring solutions/models are printed.
`comp.method`	String that determines how the models in `sols` are compared to determine their fr-score. `"causal_submodel"` (the default) checks for causal submodel relations using `causal_submodel()`, `"is.submodel"` checks for syntactic submodel relations with `is.submodel()`

Details

frscore() implements fit-robustness scoring as introduced in Parkkinen and Baumgartner (2021). The function calculates the fit-robustness scores of Boolean solutions/models output by the cna() function of the cna package. The solutions are given to frscore() as a character vector sols obtained by reanalyzing a data set repeatedly, e.g. with rean_cna(), using different consistency and coverage thresholds in each analysis.

For multi-valued models, the range of admissible values for the factors featured in the models must be provided via the argument dat, which accepts a data frame, configTable, or a list of factor-value ranges as its value, in the same manner as cna::full.ct(). Typically, one would use the data set that the models in sols were inferred from, and this is what is done automatically when frscore() is called within frscored_cna(). When the models in sols are binary, dat should be left to its default value NULL, and will in any case be ignored.

The argument scoretype is deprecated as of frscore 0.3.1, and will be dropped in the next version. Giving it a non-default value is allowed so that older code can be run without errors, but doing this is otherwise discouraged. When set to its default value "full", the score of each sols[i] is calculated by counting the (either syntactic or causal) sub- and supermodel relations sols[i] has to the other elements of sols. Setting scoretype to "supermodel" or "submodel" forces the scoring to be based on, respectively, supermodel and submodel relations only. Whether causal or syntactic submodel relations are counted depends on the value of comp.method: "causal_submodel" (default) counts causal submodel relations using causal_submodel(), "is.submodel" counts syntactic submodel relations using cna::is.submodel(). In future versions of frscore, fit-robustness scores will always be calculated as with scoretype = "full", and changing this will not be possible. If additional information about the numbers of sub- vs. supermodel relations a particular model has to other models is needed, this can be acquired by inspecting the "verbout" element of the output of frscore().

The fit-robustness scores can be normalized in two ways. In the default setting normalize = "truemax", the score of each sols[i] is divided by the maximum score obtained by an element of sols. In case of normalize = "idealmax", the score is normalized not by an actually obtained maximum but by an idealized maximum, which is calculated by assuming that all solutions of equal complexity in sols are identical and that for every sols[i] of a given complexity, all less complex elements of sols are its submodels and all more complex elements of sols are its supermodels. When normalization is applied, the normalized score is shown in its own column norm.score in the results. The raw scores are shown in the column score.

If the size of the consistency and coverage interval scanned in the reanalysis series generating sols is large or there are many model ambiguities, sols may contain so many different types of solutions/models that robustness cannot be calculated for all of them in reasonable time. In that case, the argument maxsols allows for capping the number of solution types to be included in the scoring (defaults to 50). frscore() then selects the most frequent solutions/models in sols of each complexity level until maxsols is reached and only scores the thus selected elements of sols.

If the argument verbose is set to TRUE, frscore() also prints a list indicating for each sols[i] how many raw score points it receives from which elements of sols. The verbose list is ordered with decreasing fit robustness scores.

Value

A named list where the first element is a data frame containing the unique solution/model types and their scores. Rest of the elements contain additional information about the submodel relations among the unique solutions types and about how the function was called.

References

V.P. Parkkinen and M. Baumgartner (2021), “Robustness and Model Selection in Configurational Causal Modeling,” Sociological Methods and Research, doi:10.1177/0049124120986200.

Basurto, Xavier. 2013. “Linking Multi-Level Governance to Local Common-Pool Resource Theory using Fuzzy-Set Qualitative Comparative Analysis: Insights from Twenty Years of Biodiversity Conservation in Costa Rica.” Global Environmental Change 23 (3):573-87.

Examples

# Artificial data from Parkkinen and Baumgartner (2021)
sols1 <- rean_cna(d.error, attempt = seq(1, 0.8, -0.1))
sols1 <- do.call(rbind, sols1)
frscore(sols1$condition)


# Real fuzzy-set data from Basurto (2013)
sols2 <- rean_cna(d.autonomy, type="fs", ordering = list("EM", "SP"),
         strict = TRUE, maxstep = c(3,3,9))
sols2 <- do.call(rbind, sols2)$condition  # there are 217 solutions
# At the default maxsols only 50 of those solutions are scored.
frscore(sols2)
# By increasing maxsols the number of solutions to be scored can be controlled.
frscore(sols2, maxsols = 100)


# Multi-valued data/models (data from Hartmann and Kemmerzell (2010))
# Short reanalysis series, change `attempt` value to mimick a more realistic use case
sols3 <- rean_cna(d.pban, outcome = "PB=1", attempt = seq(0.8, 0.7, -0.1), type = "mv")
sols3 <- do.call(rbind, sols3)$condition
# For mv data, frscore() needs the data to determine admissible factor values
frscore(sols3, dat = d.pban)

# Changing the normalization
frscore(sols2, normalize = "none")
frscore(sols2, normalize = "truemax")
frscore(sols2, normalize = "idealmax")

# verbose
frscore(sols2, maxsols = 20, verbose = TRUE)

[Package frscore version 0.4.1 Index]