scest {scpi} | R Documentation |
Prediction of Synthetic Control
Description
The command implements estimation procedures for Synthetic Control (SC) methods using least squares, lasso, ridge, or simplex-type constraints. For more information see Cattaneo, Feng, and Titiunik (2021).
Companion Stata and Python packages are described in Cattaneo, Feng, Palomba, and Titiunik (2022).
Companion commands are: scdata and scdataMulti for data preparation in the single and multiple treated unit(s) cases, respectively, scpi for inference procedures, scplot and scplotMulti for plots in the single and multiple treated unit(s) cases, respectively.
Related Stata, R, and Python packages useful for inference in SC designs are described in the following website:
https://nppackages.github.io/scpi/
For an introduction to synthetic control methods, see Abadie (2021) and references therein.
Usage
scest(
data,
w.constr = NULL,
V = "separate",
V.mat = NULL,
solver = "ECOS",
plot = FALSE,
plot.name = NULL,
plot.path = NULL,
save.data = NULL
)
Arguments
data |
a class 'scdata' object, obtained by calling |
w.constr |
a list specifying the constraint set the estimated weights of the donors must belong to.
|
V |
specifies the type of weighting matrix to be used when minimizing the sum of squared residuals
The default is the identity matrix, so equal weight is given to all observations. In the case of multiple treated observations
(you used |
V.mat |
A conformable weighting matrix
See the Details section for more information on how to prepare this matrix. |
solver |
a string containing the name of the solver used by |
plot |
a logical specifying whether |
plot.name |
a string containing the name of the plot (the format is by default .png). For more options see |
plot.path |
a string containing the path at which the plot should be saved (default is output of |
save.data |
a character specifying the name and the path of the saved dataframe containing the processed data used to produce the plot. |
Details
Information is provided for the simple case in which N_1=1
if not specified otherwise.
Estimation of Weights.
w.constr
specifies the constraint set on the weights. First, the elementp
allows the user to choose between imposing a constraint on either the L1 (p = "L1"
) or the L2 (p = "L2"
) norm of the weights and imposing no constraint on the norm (p = "no norm"
). Second,Q
specifies the value of the constraint on the norm of the weights. Third,lb
sets the lower bound of each component of the vector of weights. Fourth,dir
sets the direction of the constraint on the norm in casep = "L1"
orp = "L2"
. Ifdir = "=="
, then||\mathbf{w}||_p = Q,\:\:\: w_j \geq lb,\:\: j =1,\ldots,J
If instead
dir = "<="
, then||\mathbf{w}||_p \leq Q,\:\:\: w_j \geq lb,\:\: j =1,\ldots,J
If instead
dir = "NULL"
no constraint on the norm of the weights is imposed.An alternative to specifying an ad-hoc constraint set on the weights would be choosing among some popular types of constraints. This can be done by including the element '
name
' in the listw.constr
. The following are available options:-
If
name == "simplex"
(the default), then||\mathbf{w}||_1 = 1,\:\:\: w_j \geq 0,\:\: j =1,\ldots,J.
-
If
name == "lasso"
, then||\mathbf{w}||_1 \leq Q,
where
Q
is by default equal to 1 but it can be provided as an element of the list (eg.w.constr = list(name = "lasso", Q = 2)
). If
name == "ridge"
, then||\mathbf{w}||_2 \leq Q,
where
Q
is a tuning parameter that is by default computed as(J+KM) \widehat{\sigma}_u^{2}/||\widehat{\mathbf{w}}_{OLS}||_{2}^{2}
where
J
is the number of donors andKM
is the total number of covariates used for adjustment. The user can provideQ
as an element of the list (eg.w.constr = list(name = "ridge", Q = 1)
).If
name == "ols"
, then the problem is unconstrained and the vector of weights is estimated via ordinary least squares.If
name == "L1-L2"
, then||\mathbf{w}||_1 = 1,\:\:\: ||\mathbf{w}||_2 \leq Q,
where
Q
is a tuning parameter computed as in the "ridge" case.
-
Weighting Matrix.
if
V <- "separate"
, then\mathbf{V} = \mathbf{I}
and the minimized objective function is\sum_{i=1}^{N_1} \sum_{l=1}^{M} \sum_{t=1}^{T_{0}}\left(a_{t, l}^{i}-\mathbf{b}_{t, l}^{{i \prime }} \mathbf{w}^{i}-\mathbf{c}_{t, l}^{{i \prime}} \mathbf{r}_{l}^{i}\right)^{2},
which optimizes the separate fit for each treated unit.
if
V <- "pooled"
, then\mathbf{V} = \frac{1}{I}\mathbf{1}\mathbf{1}'\otimes \mathbf{I}
and the minimized objective function is\sum_{l=1}^{M} \sum_{t=1}^{T_{0}}\left(\frac{1}{N_1^2} \sum_{i=1}^{N_1}\left(a_{t, l}^{i}-\mathbf{b}_{t, l}^{i \prime} \mathbf{w}^{i}-\mathbf{c}_{t, l}^{i\prime} \mathbf{r}_{l}^{i}\right)\right)^{2},
which optimizes the pooled fit for the average of the treated units.
if the user wants to provide their own weighting matrix, then it must use the option
V.mat
to input av\times v
positive-definite matrix, wherev
is the number of rows of\mathbf{B}
(or\mathbf{C}
) after potential missing values have been removed. In case the user wants to provide their ownV
, we suggest to check the appropriate dimensionv
by inspecting the output of eitherscdata
orscdataMulti
and check the dimensions of\mathbf{B}
(and\mathbf{C}
). Note that the weighting matrix could cause problems to the optimizer if not properly scaled. For example, if\mathbf{V}
is diagonal we suggest to divide each of its entries by\|\mathrm{diag}(\mathbf{V})\|_1
.
Value
The function returns an object of class 'scest' containing two lists. The first list is labeled 'data' and
contains used data as returned by scdata
and some other values.
A |
a matrix containing pre-treatment features of the treated unit(s). |
B |
a matrix containing pre-treatment features of the control units. |
C |
a matrix containing covariates for adjustment. |
P |
a matrix whose rows are the vectors used to predict the out-of-sample series for the synthetic unit(s). |
P.diff |
for internal use only. |
Y.pre |
a matrix containing the (raw) pre-treatment outcome of the treated unit(s). |
Y.post |
a matrix containing the (raw) post-treatment outcome of the treated unit(s). |
Y.pre.agg |
a matrix containing the aggregate pre-treatment outcome of the treated unit(s). This differs from
Y.pre only in the case 'effect' in |
Y.post.agg |
a matrix containing the aggregate post-treatment outcome of the treated unit(s). This differs from
Y.post only in the case 'effect' in |
Y.donors |
a matrix containing the pre-treatment outcome of the control units. |
specs |
a list containing some specifics of the data:
|
The second list is labeled 'est.results' and contains estimation results.
w |
a matrix containing the estimated weights of the donors. |
r |
a matrix containing the values of the covariates used for adjustment. |
b |
a matrix containing |
Y.pre.fit |
a matrix containing the estimated pre-treatment outcome of the SC unit(s). |
Y.post.fit |
a matrix containing the estimated post-treatment outcome of the SC unit(s). |
A.hat |
a matrix containing the predicted values of the features of the treated unit(s). |
res |
a matrix containing the residuals |
V |
a matrix containing the weighting matrix used in estimation. |
w.constr |
a list containing the specifics of the constraint set used on the weights. |
Author(s)
Matias Cattaneo, Princeton University. cattaneo@princeton.edu.
Yingjie Feng, Tsinghua University. fengyj@sem.tsinghua.edu.cn.
Filippo Palomba, Princeton University (maintainer). fpalomba@princeton.edu.
Rocio Titiunik, Princeton University. titiunik@princeton.edu.
References
Abadie, A. (2021). Using synthetic controls: Feasibility, data requirements, and methodological aspects. Journal of Economic Literature, 59(2), 391-425.
Cattaneo, M. D., Feng, Y., and Titiunik, R. (2021). Prediction intervals for synthetic control methods. Journal of the American Statistical Association, 116(536), 1865-1880.
Cattaneo, M. D., Feng, Y., Palomba F., and Titiunik, R. (2022). scpi: Uncertainty Quantification for Synthetic Control Methods, arXiv:2202.05984.
Cattaneo, M. D., Feng, Y., Palomba F., and Titiunik, R. (2022). Uncertainty Quantification in Synthetic Controls with Staggered Treatment Adoption, arXiv:2210.05026.
See Also
scdataMulti
, scdata
, scpi
, scplot
, scplotMulti
Examples
data <- scpi_germany
df <- scdata(df = data, id.var = "country", time.var = "year",
outcome.var = "gdp", period.pre = (1960:1990),
period.post = (1991:2003), unit.tr = "West Germany",
unit.co = setdiff(unique(data$country), "West Germany"),
constant = TRUE, cointegrated.data = TRUE)
result <- scest(df, w.constr = list(name = "simplex", Q = 1))
result <- scest(df, w.constr = list(lb = 0, dir = "==", p = "L1", Q = 1))