gapclosing {gapclosing} | R Documentation |
Gap closing estimator
Description
A function to estimate gap-closing estimands: means and disparities across categories of units that would persist under some counterfactual assignment of a treatment. To use this function, the user provides a data frame data
, a rule counterfactual_assignments
for counterfactually assigning treatment, a treatment and/or an outcome model for learning statistically about the counterfactuals, and the category_name
of the variable in data
over which categories are defined. The returned object summarizes factual and counterfactual means and disparities. Supported estimation algorithms include generalized linear models, ridge regression, generalized additive models, and random forests. Standard errors are supported by bootstrapping.
Usage
gapclosing(
data,
counterfactual_assignments,
outcome_formula = NULL,
treatment_formula = NULL,
category_name,
outcome_name = NULL,
treatment_name = NULL,
treatment_algorithm = "glm",
outcome_algorithm = "lm",
sample_split = "single_sample",
se = FALSE,
bootstrap_samples = 1000,
bootstrap_method = "simple",
parallel_cores = NULL,
weight_name = NULL,
n_folds = 2,
folds_name = NULL
)
Arguments
data |
Data frame containing the observed data |
counterfactual_assignments |
Numeric scalar or vector of length nrow(data), each element of which is on the [0,1] interval. If a scalar, the counterfactual probability by which all units are assigned to treatment condition 1. If a vector, each element i corresponds to the counterfactual probability by which each unit i is assigned to treatment condition 1. |
outcome_formula |
Outcome formula , in the style |
treatment_formula |
Treatment formula, in the style |
category_name |
Character name of the variable indicating the categories over which the gap is defined. Must be the name of a column in |
outcome_name |
Character name of the outcome variable. Only required when there is no outcome_formula; otherwise extracted automatically. Must be a name of a column in |
treatment_name |
Character name of the treatment variable. Only required when there is no treatment_formula; otherwise extracted automatically. Must be a name of a column in |
treatment_algorithm |
Character name of the algorithm for the treatment model. One of "glm", "ridge", "gam", or "ranger". Defaults to "glm", which is a logit model. Option "ridge" is ridge regression. Option "gam" is a generalized additive model fit (see package |
outcome_algorithm |
Character name of the algorithm for the outcome model. One of "lm", "ridge", "gam", or "ranger". Defaults to "lm", which is an OLS model. Option "ridge" is ridge regression. Option "gam" is a generalized additive model fit (see package |
sample_split |
Character for the type of sample splitting to be conducted. One of "single_sample" or "cross_fit". Defaults to "single_sample", in which case |
se |
Logical indicating whether standard errors should be calculated. Default is FALSE. Standard errors assume a simple random sample by default; to stratify by (category x treatment), see the |
bootstrap_samples |
Only used if |
bootstrap_method |
Only used if |
parallel_cores |
Integer number of cores for parallel processing of the bootstrap. Defaults to sequential processing. |
weight_name |
Character name of a sampling weight variable, if any, which captures the inverse probability of inclusion in the sample. The default assumes a simple random sample (all weights equal). |
n_folds |
Only used if |
folds_name |
Only used if |
Value
An object of S3 class gapclosing
, which supports summary()
, print()
, and plot()
functions. The returned object can be coerced to a data frame of estimates with as.data.frame()
.
The object returned by a call to gapclosing
contains several elements.
-
factual_means
A tibble containing the factual mean outcome in each category
-
factual_disparities
A tibble containing the disparities in factual mean outcomes across categories
-
counterfactual_means
A tibble containing the counterfactual mean outcome (post-intervention mean) in each category
-
counterfactual_disparities
A tibble containing the counterfactual disparities (gap-closing estimands) across categories
-
change_means
A tibble containing the additive and proportional change from factual to counterfactual values for mean outcomes
-
change_disparities
A tibble containing the additive and proportional change from factual to counterfactual values for disparities in mean outcomes (e.g. proportion of the factual gap which is closed by the intervention)
-
all_estimators
A list containing estimates by treatment modeling, outcome modeling, and doubly-robust estimation. If any of these are not applicable, estimates are NA.
-
primary_estimator_name
The name of the primary estimator (treatment_modeling, outcome_modeling, or doubly_robust). The estimates reported in the first 6 slots of the returned object come from this estimator.
-
treatment_model
The fitted treatment model (or models on each fold in the case of cross-fitting). Note that this model object is a point estimate with standard errors derived from the algorithm used to fit it; any standard errors withintreatment_model
do not come from bootstrapping by the package.
-
outcome_model
The fitted outcome model (or models on each fold in the case of cross-fitting). Note that this model object is a point estimate with standard errors derived from the algorithm used to fit it; any standard errors withintreatment_model
do not come from bootstrapping by the package.
-
call
The call that produced thisgapclosing
object
-
arguments
A list of all arguments from the call togapclosing
References
Lundberg I (2021). "The gap-closing estimand: A causal approach to study interventions that close disparities across social categories." Sociological Methods and Research. Available at https://osf.io/gx4y3/.
Friedman J, Hastie T, Tibshirani R (2010). "Regularization Paths for Generalized Linear Models via Coordinate Descent." Journal of Statistical Software, 33(1), 1–22. https://www.jstatsoft.org/htaccess.php?volume=33&type=i&issue=01.
Wood S (2017). Generalized Additive Models: An Introduction with R, 2 edition. Chapman and Hall/CRC.
Wright MN, Ziegler A (2017). "ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R." Journal of Statistical Software, 77(1), 1–17. doi: 10.18637/jss.v077.i01.
Examples
# Simulate example data
simulated_data <- generate_simulated_data(n = 100)
# Fit by outcome modeling
# You can add standard errors with se = TRUE
estimate <- gapclosing(
data = simulated_data,
outcome_formula = outcome ~ treatment * category + confounder,
treatment_name = "treatment",
category_name = "category",
counterfactual_assignments = 1
)
summary(estimate)
# Fit by treatment modeling
# You can add standard errors with se = TRUE
estimate <- gapclosing(
data = simulated_data,
treatment_formula = treatment ~ category + confounder,
outcome_name = "outcome",
category_name = "category",
counterfactual_assignments = 1
)
summary(estimate)
# Fit by doubly-robust estimation
# You can add standard errors with se = TRUE
estimate <- gapclosing(
data = simulated_data,
outcome_formula = outcome ~ treatment * category + confounder,
treatment_formula = treatment ~ category + confounder,
category_name = "category",
counterfactual_assignments = 1
)
summary(estimate)
# Fit by doubly-robust cross-fitting estimation with random forests
# You can add standard errors with se = TRUE
estimate <- gapclosing(
data = simulated_data,
outcome_formula = outcome ~ category + confounder,
treatment_formula = treatment ~ category + confounder,
category_name = "category",
counterfactual_assignments = 1,
outcome_algorithm = "ranger",
treatment_algorithm = "ranger",
sample_split = "cross_fit"
)
summary(estimate)