staggered_cs {staggered} | R Documentation |
Calculate the Callaway & Sant'Anna (2020) estimator for staggered rollouts
Description
This functions calculates the Callaway & Sant'Anna (2020) estimator for staggered rollout designs using not-yet-treated units (including never-treated, if available) as controls.
Usage
staggered_cs(
df,
i = "i",
t = "t",
g = "g",
y = "y",
estimand = NULL,
A_theta_list = NULL,
A_0_list = NULL,
eventTime = 0,
return_full_vcv = FALSE,
return_matrix_list = FALSE,
compute_fisher = FALSE,
num_fisher_permutations = 500,
skip_data_check = FALSE
)
Arguments
df |
A data frame containing panel data with the variables y (an outcome), i (an individual identifier), t (the period in which the outcome is observe), g (the period in which i is first treated, with Inf denoting never treated) |
i |
The name of column containing the individual (cross-sectional unit) identifier. Default is "i". |
t |
The name of the column containing the time periods. Default is "t". |
g |
The name of the column containing the first period when a particular observation is treated, with Inf denoting never treated. Default is "g". |
y |
The name of the column containing the outcome variable. Default is "y". |
estimand |
The estimand to be calculated: "simple" averages all treated (t,g) combinations with weights proportional to N_g; "cohort" averages the ATEs for each cohort g, and then takes an N_g-weighted average across g; "calendar" averages ATEs for each time period, weighted by N_g for treated units, and then averages across time. "eventstudy" returns the average effect at the ”event-time” given in the parameter EventTime. The parameter can be left blank if a custom parameter is provided in A_theta_list. The argument is not case-sensitive. |
A_theta_list |
This parameter allows for specifying a custom estimand, and should be left as NULL if estimand is specified. It is a list of matrices A_theta_g so that the parameter of interest is sum_g A_theta_g Ybar_g, where Ybar_g = 1/N sum_i Y_i(g) |
A_0_list |
This parameter allow for specifying the matrices used to construct the Xhat vector of pre-treatment differences. If left NULL, the default is to use the scalar set of controls used in Callaway and Sant'Anna. If use_DiD_A0 = FALSE, then it uses the full vector possible comparisons of (g,g') in periods t<g,g'. |
eventTime |
If using estimand = "eventstudy", specify what eventTime you want the event-study parameter for. The default is 0, the period in which treatment occurs. If a vector is provided, estimates are returned for all the event-times in the vector. |
return_full_vcv |
If this is true and estimand = "eventstudy", then the function returns a list containing the full variance-covariance matrix for the event-plot estimates in addition to the usual dataframe with the estimates |
return_matrix_list |
If true, the function returns a list of the A_0_list and A_theta_list matrices along with betastar. This is used for internal recursive calls to calculate the variance-covariance matrix, and will generally not be needed by the end-user. Default is False. |
compute_fisher |
If true, computes a Fisher Randomization Test using the studentized estimator. |
num_fisher_permutations |
The number of permutations to use in the Fisher Randomization Test (if compute_fisher = TRUE). Default is 500. |
skip_data_check |
If true, skips checks that the data is balanced and contains the colums i,t,g,y. Used in internal recursive calls to increase speed, but not recommended for end-user. |
Value
resultsDF A data.frame containing: estimate (the point estimate), se (the standard error), and se_neyman (the Neyman standard error). If a vector-valued eventTime is provided, the data.frame contains multiple rows for each eventTime and an eventTime column. If return_full_vcv = TRUE and estimand = "eventstudy", the function returns a list containing resultsDF and the full variance covariance for the event-study estimates (vcv) as well as the Neyman version of the covariance matrix (vcv_neyman). (If return_matrix_list = TRUE, it likewise returns a list containing lists of matrices used in the vcv calculation.)
References
Callaway, Brantly, and Sant'Anna, Pedro H. C. (2020), 'Difference-in-Differences with Multiple Time Periods', Forthcoming at the Journal of Econometrics, doi: 10.1016/j.jeconom.2020.12.001.
Examples
# Load some libraries
library(dplyr)
library(purrr)
library(MASS)
set.seed(1234)
# load the officer data and subset it
df <- pj_officer_level_balanced
group_random <- sample(unique(df$assigned), 3)
df <- df[df$assigned %in% group_random,]
# We modify the data so that the time dimension is named t,
# the period of treatment is named g,
# the outcome is named y,
# and the individual identifiers are named i
# (this allow us to use default arguments on \code{staggered_cs}).
df <- df %>% rename(t = period, y = complaints, g = first_trained, i = uid)
# Calculate Callaway and Sant'Anna estimator for the simple weighted average
staggered_cs(df = df, estimand = "simple")
# Calculate Callaway and Sant'Anna estimator for the cohort weighted average
staggered_cs(df = df, estimand = "cohort")
# Calculate Callaway and Sant'Anna estimator for the calendar weighted average
staggered_cs(df = df, estimand = "calendar")
# Calculate Callaway and Sant'Anna event-study coefficients for the first 24 months
# (month 0 is instantaneous effect)
eventPlotResults <- staggered_cs(df = df, estimand = "eventstudy", eventTime = 0:23)
eventPlotResults %>% head()