R: grab_significance

grab_significance {tidysynth}

R Documentation

grab_significance

Description

Generate inferential statistics comparing the rarety of the unit that actually received the intervention to the placebo units in the donor pool.

Usage

grab_significance(data, time_window = NULL)

Arguments

`data`	nested data of type `tbl_df`
`time_window`	time window that the significance values should be computed.

Details

Inferential statitics are generated by comparing the observed difference between the actual treated unit and its synthetic control to each placebo unit and its synthetic control. The rarity of the actual to the placebo is used to infer the likelihood of observing the effect.

Inference in this framework leverages the mean squared predictive error (MSPE) of the fit in the pre-period to the fit in the post-period as a ratio.

\frac{RMSE_{Post}}{RMSE_{Pre}}

The ratio captures the differences between the pre-intervention fit and the post-intervention divergence of the trend (i.e. the causal quantity). A good fit in the pre-period denotes that the observed and synthetic case tracked well together. Divergence in the post-period captures the difference brought about by the intervention in the two trends. Thus, when the ratio is high, we observe more of a difference between the two trends. If, however, the pre-period fit is poor, or there is not substantial divergence in the post-period, then this ratio amount will be smaller.

The Fisher's Exact P-Value is generated by ranking the ratios for the treated and placebo units. The P-Value is then calculated by dividing the rank of the case over the total (rank/total). The case with the highest RMSE ratio is rare given the distribution of cases as generated by the placebo. A more detailed outline of inference within the synthetic control framework can be found in Adabie et al. 2010.

Note that conventional significance levels are not achievable if there is an insufficient number of control cases. One needs at least 20 control case to use the conventional .05 level. With fewer cases, significance levels need to be adjusted to accommodate the low total rank. This is a bug of rank based significance metrics.

In addition to the Fisher's Precise P-Value, a Z-score is also included, which is just the standardized RMSE ratios for all the cases. The Z-Score captures the degree to which a particular case's RMSE ratio deviates from the distribution of the placebo cases.

Value

tibble data frame containing the following fields:

unit_name: name of the unit
type: treated or donor unit (placebo)
pre_mspe: pre-intervention period means squared predictive error
post_mspe: post-intervention period means squared predictive error
mspe_ratio: post_mspe/pre_mspe; captures the difference in fit in the pre and post period. A good fit in the pre-period and a poor fit in the post-period reflects a meaningful effect when comparing the difference between the observed outcome and the synthetic control.
rank: rank order of the mspe_ratio.
fishers_exact_pvalue: rank/total to generate a p-value. Conventional levels aren't achievable if there isn't a sufficient number of controls to generate a large enough ranking. Need at least 20 control units to use the conventional .05 level.
z_score: (mspe_ratio-mean(mspe_ratio))/sd(mspe_ratio); captures the degree to which the mspe_ratio of the treated unit deviates from the mean of the placebo units. Provinding an alternative significance determination.

Examples




# Smoking example data
data(smoking)

smoking_out <-
smoking %>%

# initial the synthetic control object
synthetic_control(outcome = cigsale,
                  unit = state,
                  time = year,
                  i_unit = "California",
                  i_time = 1988,
                  generate_placebos=FALSE) %>%

# Generate the aggregate predictors used to generate the weights
  generate_predictor(time_window=1980:1988,
                     lnincome = mean(lnincome, na.rm = TRUE),
                     retprice = mean(retprice, na.rm = TRUE),
                     age15to24 = mean(age15to24, na.rm = TRUE)) %>%

  generate_predictor(time_window=1984:1988,
                     beer = mean(beer, na.rm = TRUE)) %>%

  generate_predictor(time_window=1975,
                     cigsale_1975 = cigsale) %>%

  generate_predictor(time_window=1980,
                     cigsale_1980 = cigsale) %>%

  generate_predictor(time_window=1988,
                     cigsale_1988 = cigsale) %>%


  # Generate the fitted weights for the synthetic control
  generate_weights(optimization_window =1970:1988,
                   Margin.ipop=.02,Sigf.ipop=7,Bound.ipop=6) %>%

  # Generate the synthetic control
  generate_control()

# Plot the observed and synthetic trend
smoking_out %>% grab_significance(time_window = 1970:2000)

[Package tidysynth version 0.2.0 Index]