R: Specifying analytical decisions in a specification setup

setup {specr}

R Documentation

Specifying analytical decisions in a specification setup

Description

Creates all possible specifications as a combination of different dependent and independent variables, model types, control variables, potential subset analyses, as well as potentially other analytic choices. This function represents the first step in the analytic framework implemented in the package specr. The resulting class specr.setup then needs to be passed to the core function of the package called specr(), which fits the specified models across all specifications.

Usage

setup(
  data,
  x,
  y,
  model,
  controls = NULL,
  subsets = NULL,
  add_to_formula = NULL,
  fun1 = function(x) broom::tidy(x, conf.int = TRUE),
  fun2 = function(x) broom::glance(x),
  simplify = FALSE
)

Arguments

`data`	The data set that should be used for the analysis
`x`	A vector denoting independent variables
`y`	A vector denoting the dependent variables
`model`	A vector denoting the model(s) that should be estimated.
`controls`	A vector of the control variables that should be included. Defaults to NULL.
`subsets`	Specification of potential subsets/groups as list. There are two ways in which these can be specified that both start from the assumption that the "grouping" variable is in the data set. The simplest way is to provide a named vector within the list, whose name is the variable that should be used for subsetting and whose values are the values that reflect the subsets (e.g., `⁠list(group2 = c("female", "male")⁠`). In this case, the specifications will includes "all", "only female" and "only male". Alternatively, you can also use the `unique` function to extract that vector directly from the data set (e.g., `⁠list(group2 = unique(example_data$group2⁠`). Both approaches lead to the same result. The former, however, has the advantages that one can also remove some of the subgroups (e.g. `⁠list(group2 = c("female")⁠`). In this case, the specifications will include "all" (no subset) and "only females". See examples for more details.
`add_to_formula`	A string specifying aspects that should always be included in the formula (e.g. a constant covariate, random effect structures...)
`fun1`	A function that extracts the parameters of interest from the fitted models. Defaults to tidy, which works with a large range of different models.
`fun2`	A function that extracts fit indices of interest from the models. Defaults to glance, which works with a large range of different models. Note: Different models result in different fit indices. Thus, if you use different models within one specification curve analysis, this may not work. In this case, you can simply set `fun2 = NULL` to not extract any fit indices.
`simplify`	Logical value indicating what type of combinations between control variables should be included in the specification. If FALSE (default), all combinations between the provided variables are created (none, each individually, each combination between each variable, all variables). If TRUE, only no covariates, each individually, and all covariates are included as specifications (akin to the default in specr version 0.2.1).

Details

Empirical results are often contingent on analytical decisions that are equally defensible, often arbitrary, and motivated by different reasons. This decisions may introduce bias or at least variability. To this end, specification curve analyses (Simonsohn et al., 2020) or multiverse analyses (Steegen et al., 2016) refer to identifying the set of theoretically justified, statistically valid (and potentially also non-redundant specifications, fitting the "multiverse" of models represented by these specifications and extract relevant parameters often to display the results graphically as a so-called specification curve. This allows readers to identify consequential specifications decisions and how they affect the results or parameter of interest.

Use of this function

A general overview is provided in the vignettes vignette("specr"). It is assumed that you want to estimate the relationship between two variables (x and y). What varies may be what variables should be used for x and y, what model should be used to estimate the relationship, whether the relationship should be estimated for certain subsets, and whether different combinations of control variables should be included. This allows to (re)produce almost any analytical decision imaginable. See examples below for how a number of typical analytical decision can be implemented. Afterwards you pass the resulting object of a class specr.setup to the function specr() to run the specification curve analysis.

Note, the resulting class of specr.setup allows to use generic functions. Use methods(class = "specr.setup") for an overview on available methods and e.g., ?summary.specr.setup to view the dedicated help page.

Value

An object of class specr.setup which includes all possible specifications based on combinations of the analytic choices. The resulting list includes a specification tibble, the data set, and additional information about the universe of specifications. Use methods(class = "specr.setup") for an overview on available methods.

References

Simonsohn, U., Simmons, J.P. & Nelson, L.D. (2020). Specification curve analysis. Nature Human Behaviour, 4, 1208–1214. https://doi.org/10.1038/s41562-020-0912-z
Steegen, S., Tuerlinckx, F., Gelman, A., & Vanpaemel, W. (2016). Increasing Transparency Through a Multiverse Analysis. Perspectives on Psychological Science, 11(5), 702-712. https://doi.org/10.1177/1745691616658637

Examples

## Example 1 ----
# Setting up typical specifications
specs <- setup(data = example_data,
   x = c("x1", "x2"),
   y = c("y1", "y2"),
   model = "lm",
   controls = c("c1", "c2", "c3"),
   subsets = list(group1 = c("young", "middle", "old"),
                  group2 = c("female", "male")),
   simplify = TRUE)

# Check specifications
summary(specs, rows = 18)


## Example 2 ----
# Setting up specifications for multilevel models
specs <- setup(data = example_data,
   x = c("x1", "x2"),
   y = c("y1", "y2"),
   model = c("lmer"),                                   # multilevel model
   subsets = list(group1 = c("young", "old"),           # only young and old!
                  group2 = unique(example_data$group2)),# alternative specification
   controls = c("c1", "c2"),
   add_to_formula = "(1|group2)")                       # random effect in all models

# Check specifications
summary(specs)


## Example 3 ----
# Setting up specifications with a different parameter extract functions

# Create custom extract function to extract different parameter and model
tidy_99 <- function(x) {
  fit <- broom::tidy(x,
     conf.int = TRUE,
     conf.level = .99)         # different alpha error rate
  fit$full_model = list(x)     # include entire model fit object as list
  return(fit)
}

# Setup specs
specs <- setup(data = example_data,
   x = c("x1", "x2"),
   y = c("y1", "y2"),
   model = "lm",
   fun1 = tidy_99,             # pass new function to setup
   add_to_formula = "c1 + c2") # set of covariates in all models

# Check specifications
summary(specs)

[Package specr version 1.0.0 Index]