sca {speccurvieR} | R Documentation |
Perform specification curve analysis
Description
sca() is the workhorse function of the package–this estimates models with every possible combination of the controls supplied and returns a data frame where each row contains the pertinent information and parameters for a given model by default. This data frame can then be input to plotCurve() or any other plotting function in the package. Alternatively, if 'returnFormulae = TRUE', it returns a list of formula objects with every possible combination of controls.
Usage
sca(
y,
x,
controls,
data,
family = "linear",
link = NULL,
fixedEffects = NULL,
returnFormulae = FALSE,
progressBar = TRUE,
parallel = FALSE,
workers = 2
)
Arguments
y |
A string containing the column name of the dependent variable in data. |
x |
A string containing the column name of the independent variable in data. |
controls |
A vector of strings containing the column names of the control variables in data. |
data |
A dataframe containing y, x, controls, and (optionally) the variables to be used for fixed effects or clustering. |
family |
A string indicating the family of models to be used. Defaults to "linear" for OLS regression but supports all families supported by 'glm()'. |
link |
A string specifying the link function to be used for the model. Defaults to 'NULL' for OLS regression using 'lm()' or 'fixest::feols()' depending on whether fixed effects are supplied. Supports all link functions supported by the family parameter of 'glm()'. |
fixedEffects |
A string containing the column name of the variable in data desired for fixed effects. Defaults to NULL in which case no fixed effects are included. |
returnFormulae |
A boolean. When 'TRUE' a list of model formula objects is returned but the models are not estimated. Defaults to 'FALSE' in which case a dataframe of model results is returned. |
progressBar |
A boolean indicating whether the user wants a progress bar for model estimation. Defaults to 'TRUE'. |
parallel |
A boolean indicating whether to parallelize model estimation. Parallelization only offers a speed advantage when a large (> 1000) number of models is being estimated. Defaults to 'FALSE'. |
workers |
An integer indicating the number of workers to use for parallelization. Defaults to 2. |
Value
When 'returnFormulae' is 'FALSE', a dataframe where each row contains the independent variable coefficient estimate, standard error, test statistic, p-value, model specification, and measures of model fit.
Examples
sca(y = "Salnty", x = "T_degC", controls = c("ChlorA", "O2Sat"),
data = bottles, progressBar = TRUE, parallel = FALSE);
sca(y = "Salnty", x = "T_degC", controls = c("ChlorA*NO3uM", "O2Sat*NO3uM"),
data = bottles, progressBar = TRUE, parallel = TRUE, workers = 2);
sca(y = "Salnty", x = "T_degC", controls = c("ChlorA", "O2Sat*NO3uM"),
data = bottles, progressBar = TRUE, parallel = FALSE,
returnFormulae = TRUE);