microsynth {microsynth} | R Documentation |
Synthetic control methods for disaggregated, micro-level data.
Description
Implements the synthetic control method for micro-level data as outlined in
Robbins, Saunders, and Kilmer (2017). microsynth
is designed for use
in assessment of the effect of an intervention using longitudinal data.
However, it may also be used to calculate propensity score-type weights in
cross-sectional data. microsynth
is a generalization
of Synth
(see Abadie and Gardeazabal (2003) and Abadie, Diamond,
Hainmueller (2010, 2011, 2014)) that is designed for data at a more granular
level (e.g., micro-level). For more details see the help vignette:
vignette('microsynth', package = 'microsynth')
.
microsynth
develops a synthetic control group by searching for weights
that exactly match a treatment group to a synthetic control group across
a number of variables while also minimizing the discrepancy between the
synthetic control group and the treatment group across a set second set of
variables. microsynth
works in two primary steps: 1) calculation of
weights and 2) calculation of results. Time series plots of treatment
vs. synthetic control for pertinent outcomes may be performed using the
function plot.microsynth()
.
The time range over which data are observed is segmented into pre- and
post-intervention periods. Treatment is matched to synthetic control
across the pre-intervention period, and the effect of the intervention
is assessed across the post-intervention (or evaluation) period. The input
end.pre
(which gives the last pre-intervention time period) is used to
delineate between pre- and post-intervention. Note that if the intervention
is not believed to have an instantaneous effect, end.pre
should indicate
the time of the intervention.
Variables are categorized as outcomes (which are time-variant) and covariates
(which are time-invariant). Using the respective inputs match.covar
and match.out
, the user specifies across which covariates and outcomes
(and which pre-intervention time points of the outcomes) treatment is to be
exactly matched to synthetic control. The inputs match.covar.min
and
match.out.min
are similar but instead specify variables across which
treatment is to be matched to synthetic control as closely as possible. If
there are no variables specified in match.covar.min
and
match.out.min
, the function calibrate()
from the survey
package is used to calculate weights. Otherwise, the function
LowRankQP()
from the package of the same name is used, if it is
available on the user's machine (it is now in the CRAN archive, so would need
to be installed by other means). If the LowRankQP
package is unavailable,
it will use ipop()
from the kernlab
package. In the event
that the model specified by match.covar
and match.out
is not
feasible (i.e., weights do not exist that exactly match treatment and
synthetic control subject to the given constraints), a less restrictive
backup model is used.
microsynth
has the capability to perform
statistical inference using Taylor series linearization, a jackknife and
permutation methods. Several sets of weights are calculated. A set of main
weights is calculated that is used to determine a point estimate of the
intervention effect. The main weights can also be used to perform inferences
on the point estimator via Taylor series linearization. If a jackknife is to
be used, one set of weights is calculated for each jackknife replication
group, and if permutation methods are to be used, one set of weights is
calculated for each permutation group. If treatment and synthetic control
are not easily matched based upon the model outlined in match.covar
and match.out
(i.e., an exact solution is infeasible or nearly
infeasible), it is recommended that the jackknife not be used for inference.
The software provides the user the option to output overall findings in an Excel
file. For each outcome variable, the results list the estimated treatment
effect, as well as confidence intervals of the effect and p-values of a
hypothesis test that assesses whether the effect is zero. Such results are
produced as needed for each of the three methods of statistical inference
noted above. microsynth
can also apply an omnibus test that examines
the presence of a treatment effect jointly across several outcomes.
Usage
microsynth(
data,
idvar,
intvar,
timevar = NULL,
start.pre = NULL,
end.pre = NULL,
end.post = NULL,
match.out = TRUE,
match.covar = TRUE,
match.out.min = NULL,
match.covar.min = NULL,
result.var = TRUE,
omnibus.var = result.var,
period = 1,
scale.var = "Intercept",
confidence = 0.9,
test = "twosided",
perm = 0,
jack = 0,
use.survey = TRUE,
cut.mse = Inf,
check.feas = FALSE,
use.backup = FALSE,
w = NULL,
max.mse = 0.01,
maxit = 250,
cal.epsilon = 1e-04,
calfun = "linear",
bounds = c(0, Inf),
result.file = NULL,
printFlag = TRUE,
n.cores = TRUE,
ret.stats = FALSE
)
Arguments
data |
A data frame. If longitudinal, the data must be entered in tall format (e.g., at the case/time-level with one row for each time period for each case). Missingness is not allowed. All individuals must have non-NA values of all variables at all time points. |
idvar |
A character string that gives the variable in |
intvar |
A character string that gives the variable in |
timevar |
A character string that gives the variable in
|
start.pre |
An integer indicating the time point that corresponds to the
beginning of the pre-intervention period used for
matching. When |
end.pre |
An integer that gives the final time point of the
pre-intervention period. That is, |
end.post |
An integer that gives the maximum post-intervention time that
is taken into when compiling results. That is, the treatment and synthetic
control groups are compared across the outcomes listed in |
match.out |
Either A) logical, B) a vector of variable names that
indicates across which time-varying variables treatment is to be exactly matched
to synthetic control pre-intervention, or C) a
list consisting of variable names and timespans over which variables should
be aggregated before matching. Note that outcome variables and time-varying
covariates should be included in If The following examples show the proper formatting of |
match.covar |
Either a logical or a vector of variable names that
indicates which time invariant covariates
are to be used for weighting. Weights are
calculated so that treatment and synthetic control exactly match across
these variables. If |
match.out.min |
A vector or list of the same format as |
match.covar.min |
A vector of variable names that indicates supplemental time invariant variables that are to be used for weighting, for which exact matches are not required. Weights are calculated so the distance is minimized between treatment and synthetic control across these variables. |
result.var |
A vector of variable names giving the outcome
variables for which results will be reported. Time-varying covariates
should be excluded from |
omnibus.var |
A vector of variable names that indicates the outcome
variables that are to be used within the calculation of the omnibus
statistic. Can also be a logical indicator. When |
period |
An integer that gives the granularity of the data that will be
used for plotting and compiling results. If Note that plotting is performed with
|
scale.var |
A variable name. When comparing the treatment group to all
cases, the latter is scaled to the size of the former with respect to the
variable indicated by |
confidence |
The level of confidence for confidence intervals. |
test |
The type of hypothesis test (one-sided lower, one-sided upper, or
two-sided) that is used when calculating p-values. Entries of
|
perm |
An integer giving the number of permutation groups that are used.
If |
jack |
An integer giving the number of replication groups that are used
for the jackknife. |
use.survey |
If |
cut.mse |
The maximum error (given as mean-squared error) permissible for permutation groups. Permutation groups with a larger than permissible error are dropped when calculating results. The mean-squared error is only calculated over constraints that are to be exactly satisfied. |
check.feas |
A logical indicator of whether or not the feasibility of
the model specified by |
use.backup |
A logical variable that, when true, indicates whether a
backup model should be used whenever the model specified by
|
w |
A |
max.mse |
The maximum error (given as mean-squared error) permissible
for constraints that are to be exactly satisfied. If |
maxit |
The maximum number of iterations used within the calibration
routine ( |
cal.epsilon |
The tolerance used within the calibration routine
( |
calfun |
The calibration function used within the calibration routine
( |
bounds |
Bounds for calibration weighting (fed into the
|
result.file |
A character string giving the name of a file that will be
created in the home directory containing results. If |
printFlag |
If TRUE, |
n.cores |
The number of CPU cores to use for parallelization. If
|
ret.stats |
if set to |
Details
microsynth
requires specification of the following inputs:
data
, idvar
, intvar
. data
is a longitudinal data
frame; idvar
and intvar
are character strings that specific
pertinent columns of data
. In longitudinal data, timevar
should be specified. Furthermore, specification of match.out
and
match.covar
is recommended.
microsynth
can also be used to calculate propensity score-type weights
in cross sectional data (in which case timevar
does not need to be
specified) as proposed by Hainmueller (2012).
microsynth
calculates weights using
survey::calibrate()
from the survey
package in circumstances
where a feasible solution exists for all constraints, whereas
LowRankQP::LowRankQP()
is used to assess feasibility and to
calculate weights in the event that a feasible solution to all constraints
does not exist. The LowRankQP
routine is memory-intensive and can
run quite slowly in data that have a large number of cases. To prevent
LowRankQP
from being used, set match.out.min = NULL
,
match.covar.min= NULL
, check.feas = FALSE
, and
use.backup = FALSE
.
Value
microsynth
returns a list with up to five elements: a)
w
, b) Results
, c) svyglm.stats
, and
d) Plot.Stats
, and e) info
.
w
is a list with six elements: a) Weights
, b) Intervention
,
c) MSE
, d) Model
, e) Summary
, and f) keep.groups
.
Assume there are
C total sets of weights calculated, where C = 1 + jack + perm
, and
there are N total cases across the treatment and control groups.
w$Weights
is an N x C matrix, where each column provides a set of
weights. w$Intervention
is an N x C matrix made of logical
indicators that indicate whether or not the case in the respective row is
considered treated (at any point in time) for the respective column.
Entries of NA
are to be dropped for the respective jackknife
replication group (NA
s only appear in jackknife weights).
w$MSE
is a 6 x C matrix that give the MSEs for each set of weights.
MSEs are listed for the primary and secondary constraints for the first,
second, and third models. Note that the primary constraints differ for each
model (see Robbins and Davenport, 2021). w$Model
is a length-C vector that
indicates whether backup models were used in the calculation of each set of
weights. w$keep.groups
is a logical vector indicating which groups
are to be used in analysis (groups that are not used have pre-intervention
MSE greater than cut.mse
. w$Summary
is a three-column matrix
that (for treatment,
synthetic control, and the full dataset), shows aggregate values
of the variables across which treatment and synthetic control are matched.
The summary, which is tabulated only for the primary weights, is also
printed by microsynth
while weights are being calculated.
Further, Results
is a list where each element gives the final
results for each value of end.post
. Each element of Results
is itself a matrix with each row corresponding to an outcome variable (and
a row for the omnibus test, if used) and each column denotes estimates of
the intervention effects and p-values, upper, and lower bounds of
confidence intervals as found using Taylor series linearization (Linear),
jackknife (jack), and permutation (perm) methods where needed.
In addition, svyglm.stats
is a list where each element is a
matrix that includes the output from the regression models run using the
svyglm()
function to estimate the treatment effect. The list has one
element for each value of end.post
, and the matrices each have
one row per variable in result.var
.
Next, Plot.Stats
contains the data that are displayed in the
plots which may be generated using plot.microsynth()
.
Plot.Stats
is a list with four elements (Treatment, Control,
All, Difference). The first three elements are matrices with one row per
outcome variable and one column per time point. The last element (which
gives the treatment minus control values) is an array that contains data
for each permutation group in addition to the true treatment area.
Specifically, Plot.Stats$Difference[,,1]
contains the time series of
treatment minus control for the true intervention group;
Plot.Stats$Difference[,,i+1]
contains the time series of treatment
minus control for the i^th permutation group.
Next, info
documents some input parameters for display by
print()
. A summary of weighted matching variables and of results
can be viewed using summary
Lastly, if ret.stats
is set to TRUE
, four additional elements
are returned: stats
, stats1
, stats2
and delta.out
.
stats
contains elements with the basic statistics that are the same as
the main microsynth output: outcomes in treatment, control and percentage change.
stats1
are the estimates of svyglm()
adjusted by their standard
errors. stats2
is the percent change in the observed value from each
outcome from the hypothetical outcome absent intervention. delta.out
is
a Taylor series linearization used to approximate the variance of the estimator.
References
Abadie A, Diamond A, Hainmueller J (2010). Synthetic control methods for comparative case studies: Estimating the effect of California's tobacco control program.? Journal of the American Statistical Association, 105(490), 493-505.
Abadie A, Diamond A, Hainmueller J (2011). Synth: An R Package for Synthetic Control Methods in Comparative Case Studies.? Journal of Statistical Software, 42(13), 1-17.
Abadie A, Diamond A, Hainmueller J (2015). Comparative politics and the synthetic control method. American Journal of Political Science, 59(2), 495-510.
Abadie A, Gardeazabal J (2003). The economic costs of conflict: A case study of the Basque Country.? American Economic Review, pp. 113-132.
Hainmueller, J. (2012), Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies,? Political Analysis, 20, 25-46.
Robbins MW, Saunders J, Kilmer B (2017). A framework for synthetic control methods with high-dimensional, micro-level data: Evaluating a neighborhood- specific crime intervention,? Journal of the American Statistical Association, 112(517), 109-126.
Robbins MW, Davenport S (2021). microsynth: Synthetic Control Methods for Disaggregated and Micro-Level Data in R,? Journal of Statistical Software, 97(2), doi:10.18637/jss.v097.i02.
Examples
# Use seattledmi, block-level panel data, to evaluate a crime intervention.
# Declare time-variant (outcome) and time-invariant variables for matching
cov.var <- c('TotalPop', 'BLACK', 'HISPANIC', 'Males_1521',
'HOUSEHOLDS', 'FAMILYHOUS', 'FEMALE_HOU', 'RENTER_HOU', 'VACANT_HOU')
match.out <- c('i_felony', 'i_misdemea', 'i_drugs', 'any_crime')
set.seed(99199) # for reproducibility
# Perform matching and estimation, without permutations or jackknife
# runtime: < 1 min
sea1 <- microsynth(seattledmi,
idvar='ID', timevar='time', intvar='Intervention',
start.pre=1, end.pre=12, end.post=16,
match.out=match.out, match.covar=cov.var,
result.var=match.out, omnibus.var=match.out,
test='lower',
n.cores = min(parallel::detectCores(), 2))
# View results
summary(sea1)
plot_microsynth(sea1)
## Not run:
# Repeat matching and estimation, with permutations and jackknife
# Set permutations and jack-knife to very few groups (2) for
# quick demonstration only.
# runtime: ~30 min
sea2 <- microsynth(seattledmi,
idvar='ID', timevar='time', intvar='Intervention',
start.pre=1, end.pre=12, end.post=c(14, 16),
match.out=match.out, match.covar=cov.var,
result.var=match.out, omnibus.var=match.out,
test='lower',
perm=250, jack=TRUE,
result.file=file.path(tempdir(), 'ExResults2.xlsx'),
n.cores = min(parallel::detectCores(), 2))
# View results
summary(sea2)
plot_microsynth(sea2)
# Specify additional outcome variables for matching, which makes
# matching harder.
match.out <- c('i_robbery','i_aggassau','i_burglary','i_larceny',
'i_felony','i_misdemea','i_drugsale','i_drugposs','any_crime')
# Perform matching, setting check.feas = T and use.backup = T
# to ensure model feasibility
# runtime: ~40 minutes
sea3 <- microsynth(seattledmi,
idvar='ID', timevar='time', intvar='Intervention',
end.pre=12,
match.out=match.out, match.covar=cov.var,
result.var=match.out, perm=250, jack=0,
test='lower', check.feas=TRUE, use.backup = TRUE,
result.file=file.path(tempdir(), 'ExResults3.xlsx'),
n.cores = min(parallel::detectCores(), 2))
# Aggregate outcome variables before matching, to boost model feasibility
match.out <- list( 'i_robbery'=rep(2, 6), 'i_aggassau'=rep(2, 6),
'i_burglary'=rep(1, 12), 'i_larceny'=rep(1, 12),
'i_felony'=rep(2, 6), 'i_misdemea'=rep(2, 6),
'i_drugsale'=rep(4, 3), 'i_drugposs'=rep(4, 3),
'any_crime'=rep(1, 12))
# After aggregation, use.backup and cheack.feas no longer needed
# runtime: ~40 minutes
sea4 <- microsynth(seattledmi, idvar='ID', timevar='time',
intvar='Intervention', match.out=match.out, match.covar=cov.var,
start.pre=1, end.pre=12, end.post=16,
result.var=names(match.out), omnibus.var=names(match.out),
perm=250, jack = TRUE, test='lower',
result.file=file.path(tempdir(), 'ExResults4.xlsx'),
n.cores = min(parallel::detectCores(), 2))
# View results
summary(sea4)
plot_microsynth(sea4)
# Generate weights only (for four variables)
match.out <- c('i_felony', 'i_misdemea', 'i_drugs', 'any_crime')
# runtime: ~ 20 minutes
sea5 <- microsynth(seattledmi, idvar='ID', timevar='time',
intvar='Intervention', match.out=match.out, match.covar=cov.var,
start.pre=1, end.pre=12, end.post=16,
result.var=FALSE, perm=250, jack=TRUE,
n.cores = min(parallel::detectCores(), 2))
# View weights
summary(sea5)
# Generate results only
sea6 <- microsynth(seattledmi, idvar='ID', timevar='time',
intvar='Intervention',
start.pre=1, end.pre=12, end.post=c(14, 16),
result.var=match.out, test='lower',
w=sea5, result.file=file.path(tempdir(), 'ExResults6.xlsx'),
n.cores = min(parallel::detectCores(), 2))
# View results (including previously-found weights)
summary(sea6)
# Generate plots only
plot_microsynth(sea6, plot.var=match.out[1:2])
# Apply microsynth in the traditional setting of Synth
# Create macro-level (small n) data, with 1 treatment unit
set.seed(86879)
ids.t <- names(table(seattledmi$ID[seattledmi$Intervention==1]))
ids.c <- setdiff(names(table(seattledmi$ID)), ids.t)
ids.synth <- c(base::sample(ids.t, 1), base::sample(ids.c, 100))
seattledmi.one <- seattledmi[is.element(seattledmi$ID,
as.numeric(ids.synth)), ]
# Apply microsynth to the new macro-level data
# runtime: < 5 minutes
sea8 <- microsynth(seattledmi.one, idvar='ID', timevar='time',
intvar='Intervention',
start.pre=1, end.pre=12, end.post=16,
match.out=match.out[4],
match.covar=cov.var, result.var=match.out[4],
test='lower', perm=250, jack=FALSE,
check.feas=TRUE, use.backup=TRUE,
n.cores = min(parallel::detectCores(), 2))
# View results
summary(sea8)
plot_microsynth(sea8)
# Use microsynth to calculate propensity score-type weights
# Prepare cross-sectional data at time of intervention
seattledmi.cross <- seattledmi[seattledmi$time==16, colnames(seattledmi)!='time']
# Apply microsynth to find propensity score-type weights
# runtime: ~5 minutes
sea9 <- microsynth(seattledmi.cross, idvar='ID', intvar='Intervention',
match.out=FALSE, match.covar=cov.var, result.var=match.out,
test='lower', perm=250, jack=TRUE,
n.cores = min(parallel::detectCores(), 2))
# View results
summary(sea9)
## End(Not run)