scpi {scpi}R Documentation

Prediction Intervals for Synthetic Control Methods

Description

The command implements estimation and inference procedures for Synthetic Control (SC) methods using least squares, lasso, ridge, or simplex-type constraints. Uncertainty is quantified using prediction intervals according to Cattaneo, Feng, and Titiunik (2021). scpi returns the estimated post-treatment series for the synthetic unit through the command scest and quantifies in-sample and out-of-sample uncertainty to provide confidence intervals for each point estimate.

Companion Stata and Python packages are described in Cattaneo, Feng, Palomba, and Titiunik (2022).

Companion commands are: scdata and scdataMulti for data preparation in the single and multiple treated unit(s) cases, respectively, scest for point estimation, scplot and scplotMulti for plots in the single and multiple treated unit(s) cases, respectively.

Related Stata, R, and Python packages useful for inference in SC designs are described in the following website:

https://nppackages.github.io/scpi/

For an introduction to synthetic control methods, see Abadie (2021) and references therein.

Usage

scpi(
  data,
  w.constr = NULL,
  V = "separate",
  V.mat = NULL,
  solver = "ECOS",
  P = NULL,
  u.missp = TRUE,
  u.sigma = "HC1",
  u.order = 1,
  u.lags = 0,
  u.design = NULL,
  u.alpha = 0.05,
  e.method = "all",
  e.order = 1,
  e.lags = 0,
  e.design = NULL,
  e.alpha = 0.05,
  sims = 200,
  rho = NULL,
  rho.max = 0.2,
  lgapp = "generalized",
  cores = 1,
  plot = FALSE,
  plot.name = NULL,
  w.bounds = NULL,
  e.bounds = NULL,
  save.data = NULL,
  verbose = TRUE
)

Arguments

data

a class 'scdata' object, obtained by calling scdata, or class 'scdataMulti' obtained via scdataMulti.

w.constr

a list specifying the constraint set the estimated weights of the donors must belong to. w.constr can contain up to five elements:

  • 'p', a scalar indicating the norm to be used (p should be one of "no norm", "L1", and "L2")

  • 'dir', a string indicating whether the constraint on the norm is an equality ("==") or inequality ("<=")

  • 'Q', a scalar defining the value of the constraint on the norm

  • 'lb', a scalar defining the lower bound on the weights. It can be either 0 or -Inf.

  • 'name', a character selecting one of the default proposals See the Details section for more.

V

specifies the type of weighting matrix to be used when minimizing the sum of squared residuals

(\mathbf{A}-\mathbf{B}\mathbf{w}-\mathbf{C}\mathbf{r})'\mathbf{V}(\mathbf{A}-\mathbf{B}\mathbf{w}-\mathbf{C}\mathbf{r})

The default is the identity matrix, so equal weight is given to all observations. In the case of multiple treated observations (you used scdataMulti to prepare the data), the user can specify V as a string equal to either "separate" or "pooled". If scdata() was used to prepare the data, V is automatically set to "separate" as the two options are equivalent. See the Details section for more.

V.mat

A conformable weighting matrix \mathbf{V} to be used in the minimization of the sum of squared residuals

(\mathbf{A}-\mathbf{B}\mathbf{w}-\mathbf{C}\mathbf{r})'\mathbf{V}(\mathbf{A}-\mathbf{B}\mathbf{w}-\mathbf{C}\mathbf{r}).

See the Details section for more information on how to prepare this matrix.

solver

a string containing the name of the solver used by CVXR when computing the weights. You can check which solvers are available on your machine by running CVXR::installed_solvers(). More information on what different solvers do can be found at the following link https://cvxr.rbind.io/cvxr_examples/cvxr_using-other-solvers/. "OSQP" is the default solver when 'lasso' is the constraint type, whilst "ECOS" is the default in all other cases.

P

a I\cdot T_1\times I\cdot (J+KM) matrix containing the design matrix to be used to obtain the predicted. post-intervention outcome of the synthetic control unit. T_1 is the number of post-treatment periods, J is the size of the donor pool, and K_1 is the number of covariates used for adjustment in the outcome equation.

u.missp

a logical indicating if misspecification should be taken into account when dealing with \mathbf{u}.

u.sigma

a string specifying the type of variance-covariance estimator to be used when estimating the conditional variance of \mathbf{u}.

u.order

a scalar that sets the order of the polynomial in \mathbf{B} when predicting moments of \mathbf{u}. The default is u.order = 1, however if there is risk of over-fitting, the command automatically sets it to u.order = 0. See the Details section for more information.

u.lags

a scalar that sets the number of lags of \mathbf{B} when predicting moments of \mathbf{u}. The default is u.lags = 0, however if there is risk of over-fitting, the command automatically sets it to u.lags = 0. See the Details section for more information.

u.design

a matrix with the same number of rows of \mathbf{A} and \mathbf{B} and whose columns specify the design matrix to be used when modeling the estimated pseudo-true residuals \mathbf{u}.

u.alpha

a scalar specifying the confidence level for in-sample uncertainty, i.e. 1 - u.alpha is the confidence level.

e.method

a string selecting the method to be used in quantifying out-of-sample uncertainty among: "gaussian" which uses conditional subgaussian bounds; "ls" which specifies a location-scale model for \mathbf{u}; "qreg" which employs a quantile regressions to get the conditional bounds; "all" uses each one of the previous methods.

e.order

a scalar that sets the order of the polynomial in \mathbf{B} when predicting moments of \mathbf{e}. The default is e.order = 1, however if there is risk of over-fitting, the command automatically sets it to e.order = 0. See the Details section for more information.

e.lags

a scalar that sets the number of lags of \mathbf{B} when predicting moments of \mathbf{e}. The default is e.order = 1, however if there is risk of over-fitting, the command automatically sets it to e.order = 0. See the Details section for more information.

e.design

a matrix with the same number of rows of \mathbf{A} and \mathbf{B} and whose columns specify the design matrix to be used when modeling the estimated out-of-sample residuals \mathbf{e}.

e.alpha

a scalar specifying the confidence level for out-of-sample uncertainty, i.e. 1 - e.alpha is the confidence level.

sims

a scalar providing the number of simulations to be used in quantifying in-sample uncertainty.

rho

a string specifying the regularizing parameter that imposes sparsity on the estimated vector of weights. If rho = 'type-1' (the default), then the tuning parameter is computed based on optimization inequalities. Users can provide a scalar with their own value for rho. Other options are described in the Details section.

rho.max

a scalar indicating the maximum value attainable by the tuning parameter rho.

lgapp

selects the way local geometry is approximated in simulation. The options are "generalized" and "linear". The first one accommodates for possibly non-linear constraints, whilst the second one is valid with linear constraints only.

cores

number of cores to be used by the command. The default is one.

plot

a logical specifying whether scplot should be called and a plot saved in the current working directory. For more options see scplot.

plot.name

a string containing the name of the plot (the format is by default .png). For more options see scplot.

w.bounds

a N_1\cdot T_1\times 2 matrix with the user-provided bounds on \beta. If w.bounds is provided, then the quantification of in-sample uncertainty is skipped. It is possible to provide only the lower bound or the upper bound by filling the other column with NAs.

e.bounds

a N_1\cdot T_1\times 2 matrix with the user-provided bounds on (\widehat{\mathbf{w}}, \widehat{\mathbf{r}})^{\prime}. If e.bounds is provided, then the quantification of out-of-sample uncertainty is skipped. It is possible to provide only the lower bound or the upper bound by filling the other column with NAs.

save.data

a character specifying the name and the path of the saved dataframe containing the processed data used to produce the plot.

verbose

if TRUE prints additional information in the console.

Details

Information is provided for the simple case in which N_1=1 if not specified otherwise.

Value

The function returns an object of class 'scpi' containing three lists. The first list is labeled 'data' and contains used data as returned by scdata and some other values.

A

a matrix containing pre-treatment features of the treated unit(s).

B

a matrix containing pre-treatment features of the control units.

C

a matrix containing covariates for adjustment.

P

a matrix whose rows are the vectors used to predict the out-of-sample series for the synthetic unit(s).

Y.pre

a matrix containing the pre-treatment outcome of the treated unit(s).

Y.post

a matrix containing the post-treatment outcome of the treated unit(s).

Y.pre.agg

a matrix containing the aggregate pre-treatment outcome of the treated unit(s). This differs from Y.pre only in the case 'effect' in scdataMulti() is set to either 'unit' or 'time'.

Y.post.agg

a matrix containing the aggregate post-treatment outcome of the treated unit(s). This differs from Y.post only in the case 'effect' in scdataMulti() is set to either 'unit' or 'time'.

Y.donors

a matrix containing the pre-treatment outcome of the control units.

specs

a list containing some specifics of the data:

  • J, the number of control units

  • K, a numeric vector with the number of covariates used for adjustment for each feature

  • M, number of features

  • KM, the total number of covariates used for adjustment

  • KMI, the total number of covariates used for adjustment

  • I, number of treated unit(s)

  • period.pre, a numeric vector with the pre-treatment period

  • period.post, a numeric vector with the post-treatment period

  • T0.features, a numeric vector with the number of periods used in estimation for each feature

  • T1.outcome, the number of post-treatment periods

  • constant, for internal use only

  • effect, for internal use only

  • anticipation, number of periods of potential anticipation effects

  • out.in.features, for internal use only

  • treated.units, list containing the IDs of all treated units

  • donors.list, list containing the IDs of the donors of each treated unit

The second list is labeled 'est.results' containing all the results from scest.

w

a matrix containing the estimated weights of the donors.

r

a matrix containing the values of the covariates used for adjustment.

b

a matrix containing \mathbf{w} and \mathbf{r}.

Y.pre.fit

a matrix containing the estimated pre-treatment outcome of the SC unit(s).

Y.post.fit

a matrix containing the estimated post-treatment outcome of the SC unit(s).

A.hat

a matrix containing the predicted values of the features of the treated unit(s).

res

a matrix containing the residuals \mathbf{A}-\widehat{\mathbf{A}}.

V

a matrix containing the weighting matrix used in estimation.

w.constr

a list containing the specifics of the constraint set used on the weights.

The third list is labeled 'inference.results' and contains all the inference-related results.

CI.in.sample

a matrix containing the prediction intervals taking only in-sample uncertainty in to account.

CI.all.gaussian

a matrix containing the prediction intervals estimating out-of-sample uncertainty with sub-Gaussian bounds.

CI.all.ls

a matrix containing the prediction intervals estimating out-of-sample uncertainty with a location-scale model.

CI.all.qreg

a matrix containing the prediction intervals estimating out-of-sample uncertainty with quantile regressions.

bounds

a list containing the estimated bounds (in-sample and out-of-sample uncertainty).

Sigma

a matrix containing the estimated (conditional) variance-covariance \boldsymbol{\Sigma}.

u.mean

a matrix containing the estimated (conditional) mean of the pseudo-residuals \mathbf{u}.

u.var

a matrix containing the estimated (conditional) variance-covariance of the pseudo-residuals \mathbf{u}.

e.mean

a matrix containing the estimated (conditional) mean of the out-of-sample error e.

e.var

a matrix containing the estimated (conditional) variance of the out-of-sample error e.

u.missp

a logical indicating whether the model has been treated as misspecified or not.

u.lags

an integer containing the number of lags in B used in predicting moments of the pseudo-residuals \mathbf{u}.

u.order

an integer containing the order of the polynomial in B used in predicting moments of the pseudo-residuals \mathbf{u}.

u.sigma

a string indicating the estimator used for Sigma.

u.user

a logical indicating whether the design matrix to predict moments of \mathbf{u} was user-provided.

u.T

a scalar indicating the number of observations used to predict moments of \mathbf{u}.

u.params

a scalar indicating the number of parameters used to predict moments of \mathbf{u}.

u.D

the design matrix used to predict moments of \mathbf{u},

u.alpha

a scalar determining the confidence level used for in-sample uncertainty, i.e. 1-u.alpha is the confidence level.

e.method

a string indicating the specification used to predict moments of the out-of-sample error e.

e.lags

an integer containing the number of lags in B used in predicting moments of the out-of-sample error e.

e.order

an integer containing the order of the polynomial in B used in predicting moments of the out-of-sample error e.

e.user

a logical indicating whether the design matrix to predict moments of e was user-provided.

e.T

a scalar indicating the number of observations used to predict moments of \mathbf{u}.

e.params

a scalar indicating the number of parameters used to predict moments of \mathbf{u}.

e.alpha

a scalar determining the confidence level used for out-of-sample uncertainty, i.e. 1-e.alpha is the confidence level.

e.D

the design matrix used to predict moments of \mathbf{u},

rho

an integer specifying the estimated regularizing parameter that imposes sparsity on the estimated vector of weights.

Q.star

a list containing the regularized constraint on the norm.

epskappa

a vector containing the estimates for \epsilon_{\kappa}.

sims

an integer indicating the number of simulations used in quantifying in-sample uncertainty.

failed.sims

a matrix containing the number of failed simulations per post-treatment period to estimate lower and upper bounds.

Author(s)

Matias Cattaneo, Princeton University. cattaneo@princeton.edu.

Yingjie Feng, Tsinghua University. fengyj@sem.tsinghua.edu.cn.

Filippo Palomba, Princeton University (maintainer). fpalomba@princeton.edu.

Rocio Titiunik, Princeton University. titiunik@princeton.edu.

References

See Also

scdata, scdataMulti, scest, scplot, scplotMulti

Examples


data <- scpi_germany

df <- scdata(df = data, id.var = "country", time.var = "year",
             outcome.var = "gdp", period.pre = (1960:1990),
             period.post = (1991:2003), unit.tr = "West Germany",
             unit.co = setdiff(unique(data$country), "West Germany"),
             constant = TRUE, cointegrated.data = TRUE)

result <- scpi(df, w.constr = list(name = "simplex", Q = 1), cores = 1, sims = 10)
result <- scpi(df, w.constr = list(lb = 0, dir = "==", p = "L1", Q = 1),
               cores = 1, sims = 10)
                           

[Package scpi version 2.2.5 Index]