R: Causal effect estimation in a multivariate setting

CausalMBSTS {CausalMBSTS}

R Documentation

Causal effect estimation in a multivariate setting

Description

The function estimates the general effect of an intervention in a multivariate time series setting. It uses MCMC to sample from the joint posterior distribution of the parameters of an MBSTS model before the intervention/treatment occurred. Then, it uses the post-intervention covariate values to predict the counterfactual potential outcomes. The prediction is done by sampling from the posterior predictive distribution (PPD). Then the causal effect is computed by taking the difference between the observed outcome of each time series and the mean of the PPD (credible intervals are computed accordingly).

Usage

CausalMBSTS(
  y,
  components,
  seas.period = NULL,
  cycle.period = NULL,
  X = NULL,
  dates,
  int.date,
  alpha = 0.05,
  excl.dates = NULL,
  horizon = NULL,
  H = NULL,
  nu0.r = NULL,
  s0.r,
  nu0.eps = NULL,
  s0.eps,
  niter,
  burn = NULL,
  ping = NULL
)

Arguments

`y`	t x d data.frame (or matrix) of observations, where d is the number of time series in the multivariate model.
`components`	Character vector specifying the components of the multivariate structural time series model. Possible values are c("trend", "slope", "seasonal", "cycle").
`seas.period`	Length of the seasonal pattern, if present.
`cycle.period`	Length of the cycle pattern, if present.
`X`	Optional t x N data frame (or matrix) of N predictors.
`dates`	a vector of dates of length t (with elements of class `Date`) that correspond to observations in y.
`int.date`	Date of the intervention (must be of class `Date`).
`alpha`	Level of credible interval to report for the estimated causal effect. Default set to 0.05 (i.e., reporting a two-sided 95% credible interval).
`excl.dates`	Optional vector of length t, specifying the dates (if any) in the post period that should be excluded from the computation of the causal effect. The elements of the vector must be either 0 (the corresponding date is retained) or 1 (the corresponding date is excluded). The first part that corresponds to `dates < int.date` is ignored.
`horizon`	Optional, vector of dates (with elements of class `Date`). If provided, a causal effect is computed for the time horizon(s) between `int.date` and each specified date. Defaults to the date of the last observation.
`H`	P x P variance-covariance matrix of the regression coefficients. Set by default to H = c(X'X)^(-1) which is akin to the Zellner's g-prior. The value of the scaling factor is set to `c = 1`. Alternative priors could be H = cdiag((X'X)^(-1)) or H = cI. See also Smith & Kohn, 1995 that suggest setting `c` in the range [10,1000].
`nu0.r`	Degrees of freedom of the Inverse-Wishart prior for each element of Sigma.r, a vector of errors for state r. Set by default to d + 2 (must be greater than d - 1).
`s0.r`	Scale matrix of the Inverse-Wishart prior for each Sigma.r, a vector of errors for state r. Must be a (d x d) positive definite. Default set to the variance-covariance matrix of y multiplied by a scaling factor of 0.01.
`nu0.eps`	Degrees of freedom of the Inverse-Wishart prior for Sigma.eps, a vector of observation errors for each time series. Set by default to d + 2 (must be greater than d - 1).
`s0.eps`	Scale matrix of the Inverse-Wishart prior for Sigma.eps, a vector of observation errors for each time series. Must be a (d x d) positive definite. Default set to the variance-covariance matrix of y multiplied by a scaling factor of 0.01.
`niter`	Number of MCMC iterations.
`burn`	Desired burn-in, set by default to 0.1 * `niter`.
`ping`	A status message is printed every `ping` iteration. Default set to 0.1 * `niter`. Set to 0 to not track the status.

Details

The assumed model is based on Normally distributed disturbance terms. The argument components provides flexibility for model formulation, allowing to add simultaneously up to four components that encapsulate the characteristics of a time series.

The unknown parameters are the variance-covariance matrices of the error terms and, if covariates are provided, the matrix of regression coefficients. Because of conjugacy, the priors placed on the variance-covariance matrices of the error terms are Inverse-Wishart distributions and the arguments (nu0.eps, s0.eps) and (nu0.r, s0.r) regulate their hyperparameters.

The regression coefficients are assumed to follow a matrix-Normal prior and, to incorporate a sparsity assumption, the prior mean is set to zero and a vector selecting the relevant covariates is introduced with a data augmentation step.

Sampling from the joint posterior distribution of the states and model parameters is performed with a Gibbs sampler. The estimated model is then used to perform predictions of the counterfactual potential outcomes in the period following the intervention date. In a final step, the predictions are compared to the observed outcomes, thereby defining a causal effect at each time point (pointwise effect).

The output component general.effect summarizes the estimates of two causal effects: the average causal effect (temporal average of the pointwise effect) and the cumulative causal effect (cumulative sum of the pointwise effect).

Run vignette("CausalMBSTS") for a detailed example.

For further details see Menchetti & Bojinov (2020).

Value

A list with the following components:

`mcmc`	An object of class `mbsts`.
`predict`	A list with the same components as those produced by the function `predict.mbsts`
`y`	Observations in the analysis period (excluding `excl.dates` if provided).
`dates`	Dates in the analysis period (excluding `excl.dates` if provided).
`general`	General causal effect for all iterations.
`general.effect`	Estimated average causal effect, cumulative causal effect and (1-`alpha`)% credible intervals. Returns a list if `horizon` is specified.
`mean.general`	Pointwise effect. Returns a list if `horizon` is specified.
`lower.general`	Lower bound of a (1-`alpha`)% credible interval of the pointwise effect. Returns a list if `horizon` is specified.
`upper.general`	Upper bound of a (1-`alpha`)% credible interval of the pointwise effect. Returns a list if `horizon` is specified.

Examples

## Note: The following are toy examples, for a proper analysis we recommend to run
##       at least 1000 iterations and check the convergence of the Markov chain

## Example 1 (daily data, d = 3, local level + seasonal + covariates)
# Generating a panel of observations and a vector of dates
set.seed(1)
y <- cbind(seq(0.5,100,by=0.5)*0.1 + rnorm(200),
           seq(100.25,150,by=0.25)*0.05 + rnorm(200),
           seq(1,200,by=1)*(-0.01) + rnorm(200, 0, 0.5))
dates <- seq.Date(from = as.Date('2019-01-10'),by = "days", length.out = 200)

# Adding a fictional intervention and four covariates (they should be related
# to the outcome but unaffected by the intervention). To illustrate the
# functioning of Bayesian model selection, one covariate is assumed to be
# unrelated to y.
int.date <- as.Date('2019-04-01')
y.new <- y; y.new[dates >= int.date, ] <- y.new[dates >= int.date, ]*1.3
x1 <- y[,1]*0.5 + y[,2]*0.3 + y[,3]*0.1
x2 <- y[,2]*0.1 + rnorm(dim(y)[1],0,0.5)
x3 <- y[,3]*1.2 + rnorm(dim(y)[1],0,0.5)
x4 <- rnorm(dim(y)[1], 5, 10)
X <- cbind(x1, x2, x3, x4)

# Some plots
oldpar <- par(no.readonly = TRUE)
par(mfrow=c(1,3))
for(i in 1:dim(y.new)[2]){
  plot(y.new[,i], x = dates, type='l', col='cadetblue', xlab='', ylab='', main= bquote(Y[.(i)]))
  lines(y[,i], x = dates, col='orange')
  }
par(mfrow=c(1,4))
for(i in 1:dim(X)[2]){
  plot(X[,i], type='l', col = 'darkgreen', x = dates, xlab='', ylab='', main = bquote(x[.(i)]))
  }
par(oldpar)

# Causal effect estimation
causal.1 <- CausalMBSTS(y.new, components = c("trend", "seasonal"), seas.period = 7,
                        X = X, dates = dates, int.date = int.date, s0.r = 0.1*diag(3),
                        s0.eps = 0.1*diag(3),niter = 50, burn = 5,
                        horizon = as.Date(c('2019-04-08','2019-07-28')))
summary(causal.1)

## Example 2 (weekly data, local level + cycle, d = 2)
set.seed(1)
t <- seq(from = 0,to = 4*pi, length.out=300)
y <- cbind(3*sin(2*t)+rnorm(300), 2*cos(2*t) + rnorm(300))
dates <- seq.Date(from = as.Date("2015-01-01"), by = "week", length.out=300)
int.date <- as.Date("2020-02-27")
y[dates >= int.date,] <- y[dates >= int.date,]+2

# Some plots
plot(y = y[,1], x=dates, type="l", col="cadetblue")
lines(y = y[,2], x = dates, col = "orange")
abline(v=int.date, col="red")

# Causal effect estimation
causal.2 <- CausalMBSTS(y, components = c("trend", "cycle"), cycle.period = 75,
                        dates = dates, int.date = int.date, s0.r = 0.01*diag(2),
                        s0.eps = 0.1*diag(2), niter = 100, burn = 10)
summary(causal.2)

[Package CausalMBSTS version 0.1.1 Index]