scdata {scpi} | R Documentation |
Data Preparation for scest
or scpi
for Point Estimation and Inference Procedures Using Synthetic Control Methods.
Description
The command prepares the data to be used by scest
or scpi
to implement estimation and
inference procedures for Synthetic Control (SC) methods.
It allows the user to specify the outcome variable, the features of the treated unit to be
matched, and covariate-adjustment feature by feature. The names of the output matrices
follow the terminology proposed in Cattaneo, Feng, and Titiunik (2021).
Companion Stata and Python packages are described in Cattaneo, Feng, Palomba, and Titiunik (2022).
Companion commands are: scdataMulti for data preparation in the multiple treated units case with staggered adoption, scest for point estimation, scpi for inference procedures, scplot and scplotMulti for plots in the single and multiple treated unit(s) cases, respectively.
Related Stata, R, and Python packages useful for inference in SC designs are described in the following website:
https://nppackages.github.io/scpi/
For an introduction to synthetic control methods, see Abadie (2021) and references therein.
Usage
scdata(
df,
id.var,
time.var,
outcome.var,
period.pre,
period.post,
unit.tr,
unit.co,
features = NULL,
cov.adj = NULL,
cointegrated.data = FALSE,
anticipation = 0,
constant = FALSE,
verbose = TRUE
)
Arguments
df |
a dataframe object. |
id.var |
a character or numeric scalar with the name of the variable containing units' IDs. The ID variable can be numeric or character. |
time.var |
a character with the name of the time variable. The time variable has to be numeric, integer, or Date. In
case |
outcome.var |
a character with the name of the outcome variable. The outcome variable has to be numeric. |
period.pre |
a numeric vector that identifies the pre-treatment period in time.var. |
period.post |
a numeric vector that identifies the post-treatment period in time.var. |
unit.tr |
a character or numeric scalar that identifies the treated unit in |
unit.co |
a character or numeric vector that identifies the donor pool in |
features |
a character vector containing the name of the feature variables used for estimation.
If this option is not specified the default is |
cov.adj |
a list specifying the names of the covariates to be used for adjustment for each feature. If |
cointegrated.data |
a logical that indicates if there is a belief that the data is cointegrated or not.
The default value is |
anticipation |
a scalar that indicates the number of periods of potential anticipation effects. Default is 0. |
constant |
a logical which controls the inclusion of a constant term across features. The default value is |
verbose |
if |
Details
cov.adj
can be used in two ways. First, if only one feature is specified through the optionfeatures
,cov.adj
has to be a list with one (even unnamed) element (eg.cov.adj = list(c("constant","trend"))
). Alternatively, if multiple features are specified, then the user has two possibilities:provide a list with one element, then the same covariates are used for adjustment for each feature. For example, if there are two features specified and the user inputs
cov.adj = list(c("constant","trend"))
, then a constant term and a linear trend are for adjustment for both features.provide a list with as many elements as the number of features specified, then feature-specific covariate adjustment is implemented. For example,
cov.adj = list('f1' = c("constant","trend"), 'f2' = c("trend"))
. In this case the name of each element of the list should be one (and only one) of the features specified. Note that if two (or more) features are specified and covariates adjustment has to be specified just for one of them, the user must still provide a list of the same length of the number of features, e.g.,cov.adj = list('f1' = c("constant","trend"), 'f2' = NULL
.
This option allows the user to include feature-specific constant terms or time trends by simply including "constant" or "trend" in the corresponding element of the list.
When
outcome.var
is not included infeatures
, we automatically set\mathcal{R}=\emptyset
, that is we do not perform covariate adjustment. This is because, in this setting it is natural to create the out-of-sample prediction matrix\mathbf{P}
using the post-treatment outcomes of the donor units only.cointegrated.data
allows the user to model the belief that\mathbf{A}
and\mathbf{B}
form a cointegrated system. In practice, this implies that when dealing with the pseudo-true residuals\mathbf{u}
, the first-difference of\mathbf{B}
are used rather than the levels.
Value
The command returns an object of class 'scdata' containing the following
A |
a matrix containing pre-treatment features of the treated unit. |
B |
a matrix containing pre-treatment features of the control units. |
C |
a matrix containing covariates for adjustment. |
P |
a matrix whose rows are the vectors used to predict the out-of-sample series for the synthetic unit. |
Y.pre |
a matrix containing the pre-treatment outcome of the treated unit. |
Y.post |
a matrix containing the post-treatment outcome of the treated unit. |
Y.donors |
a matrix containing the pre-treatment outcome of the control units. |
specs |
a list containing some specifics of the data:
|
Author(s)
Matias Cattaneo, Princeton University. cattaneo@princeton.edu.
Yingjie Feng, Tsinghua University. fengyj@sem.tsinghua.edu.cn.
Filippo Palomba, Princeton University (maintainer). fpalomba@princeton.edu.
Rocio Titiunik, Princeton University. titiunik@princeton.edu.
References
Abadie, A. (2021). Using synthetic controls: Feasibility, data requirements, and methodological aspects. Journal of Economic Literature, 59(2), 391-425.
Cattaneo, M. D., Feng, Y., and Titiunik, R. (2021). Prediction intervals for synthetic control methods. Journal of the American Statistical Association, 116(536), 1865-1880.
Cattaneo, M. D., Feng, Y., Palomba F., and Titiunik, R. (2022). scpi: Uncertainty Quantification for Synthetic Control Methods, arXiv:2202.05984.
Cattaneo, M. D., Feng, Y., Palomba F., and Titiunik, R. (2022). Uncertainty Quantification in Synthetic Controls with Staggered Treatment Adoption, arXiv:2210.05026.
See Also
scdataMulti
, scest
, scpi
, scplot
, scplotMulti
Examples
data <- scpi_germany
df <- scdata(df = data, id.var = "country", time.var = "year",
outcome.var = "gdp", period.pre = (1960:1990),
period.post = (1991:2003), unit.tr = "West Germany",
unit.co = setdiff(unique(data$country), "West Germany"),
constant = TRUE, cointegrated.data = TRUE)