scdata {scpi}R Documentation

Data Preparation for scest or scpi for Point Estimation and Inference Procedures Using Synthetic Control Methods.

Description

The command prepares the data to be used by scest or scpi to implement estimation and inference procedures for Synthetic Control (SC) methods. It allows the user to specify the outcome variable, the features of the treated unit to be matched, and covariate-adjustment feature by feature. The names of the output matrices follow the terminology proposed in Cattaneo, Feng, and Titiunik (2021).

Companion Stata and Python packages are described in Cattaneo, Feng, Palomba, and Titiunik (2022).

Companion commands are: scdataMulti for data preparation in the multiple treated units case with staggered adoption, scest for point estimation, scpi for inference procedures, scplot and scplotMulti for plots in the single and multiple treated unit(s) cases, respectively.

Related Stata, R, and Python packages useful for inference in SC designs are described in the following website:

https://nppackages.github.io/scpi/

For an introduction to synthetic control methods, see Abadie (2021) and references therein.

Usage

scdata(
  df,
  id.var,
  time.var,
  outcome.var,
  period.pre,
  period.post,
  unit.tr,
  unit.co,
  features = NULL,
  cov.adj = NULL,
  cointegrated.data = FALSE,
  anticipation = 0,
  constant = FALSE,
  verbose = TRUE
)

Arguments

df

a dataframe object.

id.var

a character or numeric scalar with the name of the variable containing units' IDs. The ID variable can be numeric or character.

time.var

a character with the name of the time variable. The time variable has to be numeric, integer, or Date. In case time.var is Date it should be the output of as.Date() function. An integer or numeric time variable is suggested when working with yearly data, whereas for all other formats a Date type time variable is preferred.

outcome.var

a character with the name of the outcome variable. The outcome variable has to be numeric.

period.pre

a numeric vector that identifies the pre-treatment period in time.var.

period.post

a numeric vector that identifies the post-treatment period in time.var.

unit.tr

a character or numeric scalar that identifies the treated unit in id.var.

unit.co

a character or numeric vector that identifies the donor pool in id.var.

features

a character vector containing the name of the feature variables used for estimation. If this option is not specified the default is features = outcome.var.

cov.adj

a list specifying the names of the covariates to be used for adjustment for each feature. If outcome.var is not in the variables specified in features, we force cov.adj<-NULL. See the Details section for more.

cointegrated.data

a logical that indicates if there is a belief that the data is cointegrated or not. The default value is FALSE. See the Details section for more.

anticipation

a scalar that indicates the number of periods of potential anticipation effects. Default is 0.

constant

a logical which controls the inclusion of a constant term across features. The default value is FALSE.

verbose

if TRUE prints additional information in the console.

Details

Value

The command returns an object of class 'scdata' containing the following

A

a matrix containing pre-treatment features of the treated unit.

B

a matrix containing pre-treatment features of the control units.

C

a matrix containing covariates for adjustment.

P

a matrix whose rows are the vectors used to predict the out-of-sample series for the synthetic unit.

Y.pre

a matrix containing the pre-treatment outcome of the treated unit.

Y.post

a matrix containing the post-treatment outcome of the treated unit.

Y.donors

a matrix containing the pre-treatment outcome of the control units.

specs

a list containing some specifics of the data:

  • J, the number of control units

  • K, a numeric vector with the number of covariates used for adjustment for each feature

  • KM, the total number of covariates used for adjustment

  • M, number of features

  • period.pre, a numeric vector with the pre-treatment period

  • period.post, a numeric vector with the post-treatment period

  • T0.features, a numeric vector with the number of periods used in estimation for each feature

  • T1.outcome, the number of post-treatment periods

  • outcome.var, a character with the name of the outcome variable

  • features, a character vector with the name of the features

  • constant, for internal use only

  • out.in.features, for internal use only

  • effect, for internal use only

  • sparse.matrices, for internal use only

  • treated.units, list containing the IDs of all treated units

Author(s)

Matias Cattaneo, Princeton University. cattaneo@princeton.edu.

Yingjie Feng, Tsinghua University. fengyj@sem.tsinghua.edu.cn.

Filippo Palomba, Princeton University (maintainer). fpalomba@princeton.edu.

Rocio Titiunik, Princeton University. titiunik@princeton.edu.

References

See Also

scdataMulti, scest, scpi, scplot, scplotMulti

Examples


data <- scpi_germany

df <- scdata(df = data, id.var = "country", time.var = "year",
             outcome.var = "gdp", period.pre = (1960:1990),
             period.post = (1991:2003), unit.tr = "West Germany",
             unit.co = setdiff(unique(data$country), "West Germany"),
             constant = TRUE, cointegrated.data = TRUE)


[Package scpi version 2.2.5 Index]