R: Multivariate SCM Using Time Series

mscmt {MSCMT}

R Documentation

Multivariate SCM Using Time Series

Description

mscmt performs the Multivariate Synthetic Control Method Using Time Series.

Usage

mscmt(
  data,
  treatment.identifier = NULL,
  controls.identifier = NULL,
  times.dep = NULL,
  times.pred = NULL,
  agg.fns = NULL,
  placebo = FALSE,
  placebo.with.treated = FALSE,
  univariate = FALSE,
  univariate.with.dependent = FALSE,
  check.global = TRUE,
  inner.optim = "wnnlsOpt",
  inner.opar = list(),
  outer.optim = "DEoptC",
  outer.par = list(),
  outer.opar = list(),
  std.v = c("sum", "mean", "min", "max"),
  alpha = NULL,
  beta = NULL,
  gamma = NULL,
  return.ts = TRUE,
  single.v = FALSE,
  verbose = TRUE,
  debug = FALSE,
  seed = NULL,
  cl = NULL,
  times.pred.training = NULL,
  times.dep.validation = NULL,
  v.special = integer(),
  cv.alpha = 0,
  spec.search.treated = FALSE,
  spec.search.placebos = FALSE
)

Arguments

data

Typically, a list of matrices with rows corresponding to times and columns corresponding to units for all relevant features (dependent as well as predictor variables, identified by the list elements' names). This might be the result of converting from a data.frame by using function listFromLong.

For convenience, data may alternatively be the result of function dataprep of package 'Synth'. In this case, the parameters treatment.identifier, controls.identifier, times.dep, times.pred, and agg.fns are ignored, as these input parameters are generated automatically from data. The parameters univariate, alpha, beta, and gamma are ignored by fixing them to their defaults. Using results of dataprep is experimental, because the automatic generation of input parameters may fail due to lack of information contained in results of dataprep.

treatment.identifier

A character scalar containing the name of the treated unit. Must be contained in the column names of the matrices in data.

controls.identifier

A character vector containing the names of at least two control units. Entries must be contained in the column names of the matrices in data.

times.dep

annual dates, if the format of start/end time is "dddd", e.g. "2016",
quarterly dates, if the format of start/end time is "ddddQd", e.g. "2016Q1",
monthly dates, if the format of start/end time is "dddd?dd" with "?" different from "W" (see below), e.g. "2016/03" or "2016-10",
weekly dates, if the format of start/end time is "ddddWdd", e.g. "2016W23",
daily dates, if the format of start/end time is "dddd-dd-dd", e.g. "2016-08-18",

will be constructed; these dates are looked for in the row names of the respective matrices in data. In applications with cross-validation, times.dep belongs to the main period.

times.pred

annual dates, if the format of start/end time is "dddd", e.g. "2016",
quarterly dates, if the format of start/end time is "ddddQd", e.g. "2016Q1",
monthly dates, if the format of start/end time is "dddd?dd" with "?" different from "W" (see below), e.g. "2016/03" or "2016-10",
weekly dates, if the format of start/end time is "ddddWdd", e.g. "2016W23",
daily dates, if the format of start/end time is "dddd-dd-dd", e.g. "2016-08-18",

will be constructed; these dates are looked for in the row names of the respective matrices in data. In applications with cross-validation, times.pred belongs to the main period.

agg.fns

Either NULL (default) or a character vector containing one name of an aggregation function for each predictor variable (i.e., each column of times.pred). The character string "id" may be used as a "no-op" aggregation. Each aggregation function must accept a numeric vector and return either a numeric scalar ("classical" MSCM) or a numeric vector (leading to MSCM*T* if length of vector is at least two).

placebo

A logical scalar. If TRUE, a placebo study is performed where, apart from the treated unit, each control unit is considered as treated unit in separate optimizations. Defaults to FALSE. Depending on the number of control units and the complexity of the problem, placebo studies may take a long time to finish.

placebo.with.treated

A logical scalar. If TRUE, the treated unit is included as control unit (for other treated units in placebo studies). Defaults to FALSE.

univariate

A logical scalar. If TRUE, a series of univariate SCMT optimizations is done (instead of one MSCMT optimization) even if there is more than one dependent variable. Defaults to FALSE.

univariate.with.dependent

A logical scalar. If TRUE (and if univariate is also TRUE), all dependent variables (contained in the column names of times.dep) apart from the current (real) dependent variable are included as predictors in the series of univariate SCMT optimizations. Defaults to FALSE.

check.global

A logical scalar. If TRUE (default), a check for the feasibility of the unrestricted outer optimum (where actually no restrictions are imposed by the predictor variables) is made before starting the actual optimization procedure.

inner.optim

A character scalar containing the name of the optimization method for the inner optimization. Defaults to "wnnlsOpt", which (currently) is the only supported implementation, because it outperforms all other inner optimizers we are aware of. "ipopOpt", which uses ipop, and LowRankQPOpt, which uses LowRankQP as inner optimizer, have experimental support for benchmark purposes.

inner.opar

A list containing further parameters for the inner optimizer. Defaults to the empty list. (For "wnnlsOpt", there are no meaningful further parameters.)

outer.optim

A character vector containing the name(s) of the optimization method(s) for the outer optimization. Defaults to "DEoptC", which (currently) is the recommended global optimizer. The optimizers currently supported can be found in the documentation of parameter outer.opar, where the default control parameters for the various optimizers are listed. If outer.optim has length greater than 1, one optimization is invoked for each outer optimizer (and, potentially, each random seed, see below), and the best result is used.

outer.par

A list containing further parameters for the outer optimization procedure. Defaults to the empty list. Entries in this list may override the following hard-coded general defaults:

lb=1e-8, corresponding to the lower bound for the ratio of predictor weights,
opt.separate=TRUE, corresponding to an improved outer optimization where each predictor is treated as the (potentially) most important predictor (i.e. with maximal weight) in separate optimizations (one for each predictor), see [1].

outer.opar

A list (or a list of lists, if outer.optim has length greater than 1) containing further parameters for the outer optimizer(s). Defaults to the empty list. Entries in this list may override the following hard-coded defaults for the individual optimizers, which are quite modest concerning the computing time. dim is a variable holding the problem dimension, typically the number of predictors minus one.

Optimizer	Package	Default parameters
`DEoptC`	`MSCMT`	`nG=500`, `nP=20*dim`, `waitgen=100`,
		`minimpr=1e-14`, `F=0.5`, `CR=0.9`
`cma_es`	`cmaes`	`maxit=2500`
`crs`	`nloptr`	`maxeval=2.5e4`, `xtol_rel=1e-14`,
		`population=20*dim`, `algorithm="NLOPT_GN_CRS2_LM"`
`DEopt`	`NMOF`	`nG=100`, `nP=20*dim`
`DEoptim`	`DEoptim`	`nP=20*dim`
`ga`	`GA`	`maxiter=50`, `monitor=FALSE`,
		`popSize=20*dim`
`genoud`	`rgenoud`	`print.level=0`, `max.generations=70`,
		`solution.tolerance=1e-12`, `pop.size=20*dim`,
		`wait.generations=dim`, `boundary.enforcement=2`,
		`gradient.check=FALSE`, `MemoryMatrix=FALSE`
`GenSA`	`GenSA`	`max.call=1e7`, `max.time=25/dim`,
		`trace.mat=FALSE`
`isres`	`nloptr`	`maxeval=2e4`, `xtol_rel=1e-14`,
		`population=20*dim`, `algorithm="NLOPT_GN_ISRES"`
`malschains`	`Rmalschains`	`popsize=20*dim`, `maxEvals=25000`
`nlminbOpt`	`MSCMT/stats`	`nrandom=30`
`optimOpt`	`MSCMT/stats`	`nrandom=25`
`PSopt`	`NMOF`	`nG=100`, `nP=20*dim`
`psoptim`	`pso`	`maxit=700`
`soma`	`soma`	`nMigrations=100`

If outer.opar is a list of lists, its names must correspond to (a subset of) the outer optimizers chosen in outer.optim.

std.v

A character scalar containing one of the function names "sum", "mean", "min", or "max" for the standardization of the predictor weights (weights are divided by std.v(weights) before reporting). Defaults to "sum", partial matching allowed.

alpha

A numerical vector with weights for the dependent variables in an MSCMT optimization or NULL (default). If not NULL, the length of alpha must agree with the number of dependent variables, NULL is equivalent to weight 1 for all dependent variables.

beta

Either NULL (default), a numerical vector, or a list. If beta is a numerical vector or a list, its length must agree with the number of dependent variables.

If beta is a numerical vector, the ith dependent variable is discounted with discount factor beta[i] (the observations of the dependent variables must thus be in chronological order!).
If beta is a list, the components of beta must be numerical vectors with lengths corresponding to the numbers of observations for the individual dependent variables. These observations are then multiplied with the corresponding component of beta.

gamma

Either NULL (default), a numerical vector, or a list. If gamma is a numerical vector or a list, its length must agree with the number of predictor variables.

If gamma is a numerical vector, the output of agg.fns[i] applied to the ith predictor variable is discounted with discount factor gamma[i] (the output of agg.fns[i] must therefore be in chronological order!).
If gamma is a list, the components of gamma must be numerical vectors with lengths corresponding to the lengths of the output of agg.fns for the individual predictor variables. The output of agg.fns is then multiplied with the corresponding component of gamma.

return.ts

A logical scalar. If TRUE (default), most results are converted to time series.

single.v

A logical scalar. If FALSE (default), a selection of feasible (optimal!) predictor weight vectors is generated. If TRUE, the one optimal weight vector which has maximal order statistics is generated to facilitate cross validation studies.

verbose

A logical scalar. If TRUE (default), output is verbose.

debug

A logical scalar. If TRUE, output is very verbose. Defaults to FALSE.

seed

A numerical vector or NULL. If not NULL, the random number generator is initialized with the elements of seed via set.seed(seed) (see Random) before calling the optimizer, performing repeated optimizations (and staying with the best) if seed has length greater than 1. Defaults to NULL. If not NULL, the seeds int.seed (default: 53058) and unif.seed (default: 812821) for genoud are also initialized to the corresponding element of seed, but this can be overridden with the list elements int.seed and unif.seed of (the corresponding element of) outer.opar.

cl

NULL (default) or an object of class cluster obtained by makeCluster of package parallel. Repeated estimations (see outer.optim and seed) and placebo studies will make use of the cluster cl (if not NULL).

times.pred.training

A matrix with two rows (containing start times in the first and end times in the second row) and one column for each predictor variable, where the column names must exactly match the names of the corresponding predictor variables (or NULL by default). If not NULL, times.pred.training defines training periods for cross-validation applications. For the format of the start and end times, see the documentation of parameter times.pred.

times.dep.validation

A matrix with two rows (containing start times in the first and end times in the second row) and one column for each dependent variable, where the column names must exactly match the names of the corresponding dependent variables (or NULL by default). If not NULL, times.dep.validation defines validation period(s) for cross-validation applications. For the format of the start and end times, see the documentation of parameter times.dep.

v.special

integer vector containing indices of important predictors with special treatment (see below). Defaults to the empty set.

cv.alpha

numeric scalar containing the minimal proportion (of the maximal feasible weight) for the weights of the predictors selected by v.special. Defaults to 0.

spec.search.treated

A logical scalar. If TRUE, a specification search (for the optimal set of included predictors) is done for the treated unit. Defaults to FALSE.

spec.search.placebos

A logical scalar. If TRUE, a specification search (for the optimal set of included predictors) is done for the control unit. Defaults to FALSE.

Details

mscmt combines, if necessary, the preparation of the raw data (which is expected to be in "list" format, possibly after conversion from a data.frame with function listFromLong) and the call to the appropriate MSCMT optimization procedures (depending on the input parameters). For details on the input parameters alpha, beta, and gamma, see [1]. For details on cross-validation, see [2].

Value

An object of class "mscmt", which is essentially a list containing the results of the estimation and, if applicable, the placebo study. The most important list elements are

the weight vector w for the control units,
a matrix v with weight vectors for the predictors in its columns,
scalars loss.v and rmspe with the dependent loss and its square root,
a vector loss.w with the predictor losses corresponding to the various weight vectors in the columns of v,
a matrix predictor.table containing aggregated statistics of predictor values (similar to list element tab.pred of function synth.tab of package 'Synth'),
a list of multivariate time series combined containing, for each dependent and predictor variable, a multivariate time series with elements treated for the actual values of the treated unit, synth for the synthesized values, and gaps for the differences.

Placebo studies produce a list containing individual results for each unit (as treated unit), starting with the original treated unit, as well as a list element named placebo with aggregated results for each dependent and predictor variable.

If times.pred.training and times.dep.validation are not NULL, a cross-validation is done and a list of elements cv with the results of the cross-validation period and main with the results of the main period is returned.

References

[1] Becker M, Klößner S (2018). “Fast and Reliable Computation of Generalized Synthetic Controls.” Econometrics and Statistics, 5, 1–19. https://doi.org/10.1016/j.ecosta.2017.08.002.

[2] Becker M, Klößner S, Pfeifer G (2018). “Cross-Validating Synthetic Controls.” Economics Bulletin, 38, 603-609. Working Paper, http://www.accessecon.com/Pubs/EB/2018/Volume38/EB-18-V38-I1-P58.pdf.

Examples

## Not run: 
## for examples, see the package vignettes:
browseVignettes(package="MSCMT")

## End(Not run)

[Package MSCMT version 1.4.0 Index]