R: Grouped Panel Data Model

grouped_plm {PAGFL}

R Documentation

Grouped Panel Data Model

Description

Estimate a grouped panel data model given an observed group structure. Slope parameters are homogeneous within groups but heterogeneous across groups. This function supports both static and dynamic panel data models, with or without endogenous regressors.

Usage

grouped_plm(
  formula,
  data,
  groups,
  index = NULL,
  n_periods = NULL,
  method = "PLS",
  Z = NULL,
  bias_correc = FALSE,
  rho = 0.07 * log(N * n_periods)/sqrt(N * n_periods),
  verbose = TRUE,
  parallel = TRUE,
  ...
)

## S3 method for class 'gplm'
print(x, ...)

## S3 method for class 'gplm'
formula(x, ...)

## S3 method for class 'gplm'
df.residual(object, ...)

## S3 method for class 'gplm'
summary(object, ...)

## S3 method for class 'gplm'
coef(object, ...)

## S3 method for class 'gplm'
residuals(object, ...)

## S3 method for class 'gplm'
fitted(object, ...)

Arguments

`formula`	a formula object describing the model to be estimated.
`data`	a `data.frame` or `matrix` holding a panel data set. If no `index` variables are provided, the panel must be balanced and ordered in the long format `\bold{Y}=(Y_1^\prime, \dots, Y_N^\prime)^\prime`, `Y_i = (Y_{i1}, \dots, Y_{iT})^\prime` with `Y_{it} = (y_{it}, x_{it}^\prime)^\prime`. Conversely, if `data` is not ordered or not balanced, `data` must include two index variables that declare the cross-sectional unit `i` and the time period `t` of each observation.
`groups`	a numerical or character vector of length `N` that indicates the group membership of each cross-sectional unit `i`.
`index`	a character vector holding two strings. The first string denotes the name of the index variable identifying the cross-sectional unit `i`, and the second string represents the name of the variable declaring the time period `t`. In case of a balanced panel data set that is ordered in the long format, `index` can be left empty if the the number of time periods `n_periods` is supplied.
`n_periods`	the number of observed time periods `T`. If an `index` is passed, this argument can be left empty.
`method`	the estimation method. Options are `"PLS"` for using the penalized least squares (PLS) algorithm. We recommend PLS in case of (weakly) exogenous regressors (Mehrabani, 2023, sec. 2.2). `"PGMM"` for using the penalized Generalized Method of Moments (PGMM). PGMM is required when instrumenting endogenous regressors, in which case a matrix `\bold{Z}` containing the necessary exogenous instruments must be supplied (Mehrabani, 2023, sec. 2.3). Default is `"PLS"`.
`Z`	a `NT \times q` `matrix` or `data.frame` of exogenous instruments, where `q \geq p`, `\bold{Z}=(z_1, \dots, z_N)^\prime`, `z_i = (z_{i1}, \dots, z_{iT})^\prime` and `z_{it}` is a `q \times 1` vector. `Z` is only required when `method = "PGMM"` is selected. When using `"PLS"`, the argument can be left empty or it is disregarded. Default is `NULL`.
`bias_correc`	logical. If `TRUE`, a Split-panel Jackknife bias correction following Dhaene and Jochmans (2015) is applied to the slope parameters. We recommend using the correction when working with dynamic panels. Default is `FALSE`.
`rho`	a tuning parameter balancing the fitness and penalty terms in the IC. If left unspecified, the heuristic `\rho = 0.07 \frac{\log(NT)}{\sqrt{NT}}` of Mehrabani (2023, sec. 6) is used. We recommend the default.
`verbose`	logical. If `TRUE`, helpful warning messages are shown. Default is `TRUE`.
`parallel`	logical. If `TRUE`, certain operations are parallelized across multiple cores. Default is `TRUE`.
`...`	ellipsis
`x`	of class `gplm`.
`object`	of class `gplm`.

Details

Consider the grouped panel data model

y_{it} = \gamma_i + \beta^\prime_{i} x_{it} + \epsilon_{it}, \quad i = 1, \dots, N, \; t = 1, \dots, T,

where y_{it} is the scalar dependent variable, \gamma_i is an individual fixed effect, x_{it} is a p \times 1 vector of explanatory variables, and \epsilon_{it} is a zero mean error. The coefficient vector \beta_i is subject to the observed group pattern

\beta_i = \sum_{k = 1}^K \alpha_k \bold{1} \{i \in G_k \},

with \cup_{k = 1}^K G_k = \{1, \dots, N\}, G_k \cap G_j = \emptyset and \| \alpha_k - \alpha_j \| \neq 0 for any k \neq j, k = 1, \dots, K.

Using PLS, the group-specific coefficients for group k are obtained via OLS

\hat{\alpha}_k = \left( \sum_{i \in G_k} \sum_{t = 1}^T \tilde{x}_{it} \tilde{x}_{it}^\prime \right)^{-1} \sum_{i \in G_k} \sum_{t = 1}^T \tilde{x}_{it} \tilde{y}_{it},

where \tilde{a}_{it} = a_{it} - T^{-1} \sum_{t=1}^T a_{it}, a = \{y, x\} to concentrate out the individual fixed effects \gamma_i (within-transformation).

In case of PGMM, the slope coefficients are derived as

\hat{\alpha}_k = \left( \left[ \sum_{i \in G_k} T^{-1} \sum_{t = 1}^T z_{it} \Delta x_{it} \right]^\prime W_k \left[ \sum_{i \in G_k} T^{-1} \sum_{t = 1}^T z_{it} \Delta x_{it} \right] \right)^{-1}

\quad \quad \left[ \sum_{i \in G_k} T^{-1} \sum_{t = 1}^T z_{it} \Delta x_{it} \right]^\prime W_k \left[ \sum_{i \in G_k} T^{-1} \sum_{t = 1}^T z_{it} \Delta y_{it} \right],

where W_k is a q \times q p.d. symmetric weight matrix and \Delta denotes the first difference operator \Delta x_{it} = x_{it} - x_{it-1} (first-difference transformation).

Value

An object of class gplm holding

`model`	a `data.frame` containing the dependent and explanatory variables as well as cross-sectional and time indices,
`coefficients`	a `K \times p` matrix of the group-specific parameter estimates,
`groups`	a `list` containing (i) the total number of groups `K` and (ii) a vector of group memberships `g_1, \dots, g_N)`, where `g_i = k` if `i` is assigned to group `k`,
`residuals`	a vector of residuals of the demeaned model,
`fitted`	a vector of fitted values of the demeaned model,
`args`	a `list` of additional arguments,
`IC`	a `list` containing (i) the value of the IC and (ii) the MSE,
`call`	the function call.

A gplm object has print, summary, fitted, residuals, formula, df.residual, and coef S3 methods.

Author(s)

Paul Haimerl

References

Dhaene, G., & Jochmans, K. (2015). Split-panel jackknife estimation of fixed-effect models. The Review of Economic Studies, 82(3), 991-1030. doi:10.1093/restud/rdv007. Mehrabani, A. (2023). Estimation and identification of latent group structures in panel data. Journal of Econometrics, 235(2), 1464-1482. doi:10.1016/j.jeconom.2022.12.002.

Examples

# Simulate a panel with a group structure
sim <- sim_DGP(N = 20, n_periods = 80, p = 2, n_groups = 3)
y <- sim$y
X <- sim$X
groups <- sim$groups
df <- cbind(y = c(y), X)

# Estimate the grouped panel data model
estim <- grouped_plm(y ~ ., data = df, groups = groups, n_periods = 80, method = "PLS")
summary(estim)

# Lets pass a panel data set with explicit cross-sectional and time indicators
i_index <- rep(1:20, each = 80)
t_index <- rep(1:80, 20)
df <- data.frame(y = c(y), X, i_index = i_index, t_index = t_index)
estim <- grouped_plm(
  y ~ ., data = df, index = c("i_index", "t_index"), groups = groups, method = "PLS"
)
summary(estim)

[Package PAGFL version 1.1.1 Index]