sca {speccurvieR}R Documentation

Perform specification curve analysis

Description

sca() is the workhorse function of the package–this estimates models with every possible combination of the controls supplied and returns a data frame where each row contains the pertinent information and parameters for a given model by default. This data frame can then be input to plotCurve() or any other plotting function in the package. Alternatively, if 'returnFormulae = TRUE', it returns a list of formula objects with every possible combination of controls.

Usage

sca(
  y,
  x,
  controls,
  data,
  family = "linear",
  link = NULL,
  fixedEffects = NULL,
  returnFormulae = FALSE,
  progressBar = TRUE,
  parallel = FALSE,
  workers = 2
)

Arguments

y

A string containing the column name of the dependent variable in data.

x

A string containing the column name of the independent variable in data.

controls

A vector of strings containing the column names of the control variables in data.

data

A dataframe containing y, x, controls, and (optionally) the variables to be used for fixed effects or clustering.

family

A string indicating the family of models to be used. Defaults to "linear" for OLS regression but supports all families supported by 'glm()'.

link

A string specifying the link function to be used for the model. Defaults to 'NULL' for OLS regression using 'lm()' or 'fixest::feols()' depending on whether fixed effects are supplied. Supports all link functions supported by the family parameter of 'glm()'.

fixedEffects

A string containing the column name of the variable in data desired for fixed effects. Defaults to NULL in which case no fixed effects are included.

returnFormulae

A boolean. When 'TRUE' a list of model formula objects is returned but the models are not estimated. Defaults to 'FALSE' in which case a dataframe of model results is returned.

progressBar

A boolean indicating whether the user wants a progress bar for model estimation. Defaults to 'TRUE'.

parallel

A boolean indicating whether to parallelize model estimation. Parallelization only offers a speed advantage when a large (> 1000) number of models is being estimated. Defaults to 'FALSE'.

workers

An integer indicating the number of workers to use for parallelization. Defaults to 2.

Value

When 'returnFormulae' is 'FALSE', a dataframe where each row contains the independent variable coefficient estimate, standard error, test statistic, p-value, model specification, and measures of model fit.

Examples

sca(y = "Salnty", x = "T_degC", controls = c("ChlorA", "O2Sat"),
    data = bottles, progressBar = TRUE, parallel = FALSE);
sca(y = "Salnty", x = "T_degC", controls = c("ChlorA*NO3uM", "O2Sat*NO3uM"),
    data = bottles, progressBar = TRUE, parallel = TRUE, workers = 2);
sca(y = "Salnty", x = "T_degC", controls = c("ChlorA", "O2Sat*NO3uM"),
    data = bottles, progressBar = TRUE, parallel = FALSE,
    returnFormulae = TRUE);

[Package speccurvieR version 0.3.0 Index]