R: Perform specification curve analysis

sca {speccurvieR}

R Documentation

Perform specification curve analysis

Description

sca() is the workhorse function of the package–this estimates models with every possible combination of the controls supplied and returns a data frame where each row contains the pertinent information and parameters for a given model by default. This data frame can then be input to plotCurve() or any other plotting function in the package. Alternatively, if 'returnFormulae = TRUE', it returns a list of formula objects with every possible combination of controls.

Usage

sca(
  y,
  x,
  controls,
  data,
  family = "linear",
  link = NULL,
  fixedEffects = NULL,
  returnFormulae = FALSE,
  progressBar = TRUE,
  parallel = FALSE,
  workers = 2
)

Arguments

`y`	A string containing the column name of the dependent variable in data.
`x`	A string containing the column name of the independent variable in data.
`controls`	A vector of strings containing the column names of the control variables in data.
`data`	A dataframe containing y, x, controls, and (optionally) the variables to be used for fixed effects or clustering.
`family`	A string indicating the family of models to be used. Defaults to "linear" for OLS regression but supports all families supported by 'glm()'.
`link`	A string specifying the link function to be used for the model. Defaults to 'NULL' for OLS regression using 'lm()' or 'fixest::feols()' depending on whether fixed effects are supplied. Supports all link functions supported by the family parameter of 'glm()'.
`fixedEffects`	A string containing the column name of the variable in data desired for fixed effects. Defaults to NULL in which case no fixed effects are included.
`returnFormulae`	A boolean. When 'TRUE' a list of model formula objects is returned but the models are not estimated. Defaults to 'FALSE' in which case a dataframe of model results is returned.
`progressBar`	A boolean indicating whether the user wants a progress bar for model estimation. Defaults to 'TRUE'.
`parallel`	A boolean indicating whether to parallelize model estimation. Parallelization only offers a speed advantage when a large (> 1000) number of models is being estimated. Defaults to 'FALSE'.
`workers`	An integer indicating the number of workers to use for parallelization. Defaults to 2.

Value

When 'returnFormulae' is 'FALSE', a dataframe where each row contains the independent variable coefficient estimate, standard error, test statistic, p-value, model specification, and measures of model fit.

Examples

sca(y = "Salnty", x = "T_degC", controls = c("ChlorA", "O2Sat"),
    data = bottles, progressBar = TRUE, parallel = FALSE);
sca(y = "Salnty", x = "T_degC", controls = c("ChlorA*NO3uM", "O2Sat*NO3uM"),
    data = bottles, progressBar = TRUE, parallel = TRUE, workers = 2);
sca(y = "Salnty", x = "T_degC", controls = c("ChlorA", "O2Sat*NO3uM"),
    data = bottles, progressBar = TRUE, parallel = FALSE,
    returnFormulae = TRUE);

[Package speccurvieR version 0.3.0 Index]