R: Clustered Covariance Matrix Estimation for Panel Data

vcovPL {sandwich}

R Documentation

Clustered Covariance Matrix Estimation for Panel Data

Description

Estimation of sandwich covariances a la Newey-West (1987) and Driscoll and Kraay (1998) for panel data.

Usage

vcovPL(x, cluster = NULL, order.by = NULL,
  kernel = "Bartlett", sandwich = TRUE, fix = FALSE, ...)

meatPL(x, cluster = NULL, order.by = NULL,
  kernel = "Bartlett", lag = "NW1987", bw = NULL,
  adjust = TRUE, aggregate = TRUE, ...)

Arguments

`x`	a fitted model object.
`cluster`	a single variable indicating the clustering of observations, or a `list` (or `data.frame`) of one or two variables, or a formula specifying which one ore two variables from the fitted model should be used (see examples). In case two variables are specified, the second variable is assumed to provide the time ordering (instead of using the argument `order.by`). By default (`cluster = NULL`), either `attr(x, "cluster")` is used (if any) or otherwise every observation is assumed to be its own cluster.
`order.by`	a variable, list/data.frame, or formula indicating the aggregation within time periods. By default `attr(x, "order.by")` is used (if any) or specified through the second variable in `cluster` (see above). If neither is available, observations within clusters are assumed to be ordered.
`kernel`	a character specifying the kernel used. All kernels described in Andrews (1991) are supported, see `kweights`.
`lag`	character or numeric, indicating the lag length used. Three rules of thumb (`"max"` or equivalently `"P2009"`, `"NW1987"`, or `"NW1994"`) can be specified, or a numeric number of lags can be specified directly. By default, `"NW1987"` is used.
`bw`	numeric. The bandwidth of the kernel which by default corresponds to `lag + 1`. Only one of `lag` and `bw` should be used.
`sandwich`	logical. Should the sandwich estimator be computed? If set to `FALSE` only the meat matrix is returned.
`fix`	logical. Should the covariance matrix be fixed to be positive semi-definite in case it is not?
`adjust`	logical. Should a finite sample adjustment be made? This amounts to multiplication with `n/(n - k)` where `n` is the number of observations and `k` is the number of estimated parameters.
`aggregate`	logical. Should the `estfun` be aggregated within each time period (yielding Driscoll and Kraay 1998) or not (restricting cross-sectional and cross-serial correlation to zero, yielding panel Newey-West)?
`...`	arguments passed to the `metaPL` or `estfun` function, respectively.

Details

vcovPL is a function for estimating the Newey-West (1987) and Driscoll and Kraay (1998) covariance matrix. Driscoll and Kraay (1998) apply a Newey-West type correction to the sequence of cross-sectional averages of the moment conditions (see Hoechle (2007)). For large T (and regardless of the length of the cross-sectional dimension), the Driscoll and Kraay (1998) standard errors are robust to general forms of cross-sectional and serial correlation (Hoechle (2007)). The Newey-West (1987) covariance matrix restricts the Driscoll and Kraay (1998) covariance matrix to no cross-sectional correlation.

The function meatPL is the work horse for estimating the meat of Newey-West (1987) and Driscoll and Kraay (1998) covariance matrix estimators. vcovPL is a wrapper calling sandwich and bread (Zeileis 2006).

Default lag length is the "NW1987". For lag = "NW1987", the lag length is chosen from the heuristic floor[T^{(1/4)}]. More details on lag length selection in Hoechle (2007). For lag = "NW1994", the lag length is taken from the first step of Newey and West's (1994) plug-in procedure.

The cluster/order.by specification can be made in a number of ways: Either both can be a single variable or cluster can be a list/data.frame of two variables. If expand.model.frame works for the model object x, the cluster (and potentially additionally order.by) can also be a formula. By default (cluster = NULL, order.by = NULL), attr(x, "cluster") and attr(x, "order.by") are checked and used if available. If not, every observation is assumed to be its own cluster, and observations within clusters are assumed to be ordered accordingly. If the number of observations in the model x is smaller than in the original data due to NA processing, then the same NA processing can be applied to cluster if necessary (and x$na.action being available).

Value

A matrix containing the covariance matrix estimate.

References

Andrews DWK (1991). “Heteroscedasticity and Autocorrelation Consistent Covariance Matrix Estimation”, Econometrica, 817–858.

Driscoll JC & Kraay AC (1998). “Consistent Covariance Matrix Estimation with Spatially Dependent Panel Data”, The Review of Economics and Statistics, 80(4), 549–560.

Hoechle D (2007). “Robust Standard Errors for Panel Regressions with Cross-Sectional Dependence”, Stata Journal, 7(3), 281–312.

Newey WK & West KD (1987). “Hypothesis Testing with Efficient Method of Moments Estimation”, International Economic Review, 777-787.

Newey WK & West KD (1994). “Automatic Lag Selection in Covariance Matrix Estimation”, The Review of Economic Studies, 61(4), 631–653.

White H (1980). “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity”, Econometrica, 817–838. doi:10.2307/1912934

Zeileis A (2004). “Econometric Computing with HC and HAC Covariance Matrix Estimator”, Journal of Statistical Software, 11(10), 1–17. doi:10.18637/jss.v011.i10

Zeileis A (2006). “Object-Oriented Computation of Sandwich Estimators”, Journal of Statistical Software, 16(9), 1–16. doi:10.18637/jss.v016.i09

Zeileis A, Köll S, Graham N (2020). “Various Versatile Variances: An Object-Oriented Implementation of Clustered Covariances in R.” Journal of Statistical Software, 95(1), 1–36. doi:10.18637/jss.v095.i01

Examples

## Petersen's data
data("PetersenCL", package = "sandwich")
m <- lm(y ~ x, data = PetersenCL)

## Driscoll and Kraay standard errors
## lag length set to: T - 1 (maximum lag length)
## as proposed by Petersen (2009)
sqrt(diag(vcovPL(m, cluster = ~ firm + year, lag = "max", adjust = FALSE)))

## lag length set to: floor(4 * (T / 100)^(2/9))
## rule of thumb proposed by Hoechle (2007) based on Newey & West (1994)
sqrt(diag(vcovPL(m, cluster = ~ firm + year, lag = "NW1994")))

## lag length set to: floor(T^(1/4))
## rule of thumb based on Newey & West (1987)
sqrt(diag(vcovPL(m, cluster = ~ firm + year, lag = "NW1987")))

## the following specifications of cluster/order.by are equivalent
vcovPL(m, cluster = ~ firm + year)
vcovPL(m, cluster = PetersenCL[, c("firm", "year")])
vcovPL(m, cluster = ~ firm, order.by = ~ year)
vcovPL(m, cluster = PetersenCL$firm, order.by = PetersenCL$year)

## these are also the same when observations within each
## cluster are already ordered
vcovPL(m, cluster = ~ firm)
vcovPL(m, cluster = PetersenCL$firm)

[Package sandwich version 3.1-0 Index]