R: Simulate a Panel With a Group Structure in the Slope...

sim_DGP {PAGFL}

R Documentation

Simulate a Panel With a Group Structure in the Slope Coefficients

Description

Construct a static or dynamic, exogenous or endogenous panel data set subject to a group structure in the slope coefficients with optional AR(1) or GARCH(1,1) innovations.

Usage

sim_DGP(
  N = 50,
  n_periods = 40,
  p = 2,
  n_groups = 3,
  group_proportions = NULL,
  error_spec = "iid",
  dynamic = FALSE,
  dyn_panel = lifecycle::deprecated(),
  q = NULL,
  alpha_0 = NULL
)

Arguments

`N`	the number of cross-sectional units. Default is 50.
`n_periods`	the number of simulated time periods `T`. Default is 40.
`p`	the number of explanatory variables. Default is 2.
`n_groups`	the number of groups `K`. Default is 3.
`group_proportions`	a numeric vector of length `n_groups` indicating size of each group as a fraction of `N`. If `NULL`, all groups are of size `N / K`. Default is `NULL`.
`error_spec`	options include `"iid"` for `iid` errors. `"AR"` for an `AR(1)` error process with an autoregressive coefficient of 0.5. `"GARCH"` for a `GARCH(1,1)` error process with a 0.05 constant, a 0.05 ARCH and a 0.9 GARCH coefficient. Default is `"iid"`.
`dynamic`	Logical. If `TRUE`, the panel includes one stationary autoregressive lag of `y_{it}` as an explanatory variable (see sec. Details for more information on the `AR` coefficient). Default is `FALSE`.
`dyn_panel`	deprecated and replaced by `dynamic`.
`q`	the number of exogenous instruments when a panel with endogenous regressors is to be simulated. If panel data set with exogenous regressors is supposed to be generated, pass `NULL`. Default is `NULL`.
`alpha_0`	a `K \times p` matrix of group-specific coefficients. If `dynamic = TRUE`, the first column represents the stationary `AR` coefficient. If `NULL`, the coefficients are drawn randomly (see sec. Details). Default is `NULL`.

Details

The scalar dependent variable y_{it} is generated according to the following grouped panel data model

y_{it} = \gamma_i + \beta_i^\prime x_{it} + u_{it}, \quad i = \{1, \dots, N\}, \quad t = \{1, \dots, T\}.

\gamma_i represents individual fixed effects and x_{it} a p \times 1 vector of regressors. The individual slope coefficient vectors \beta_i are subject to a group structure

\beta_i = \sum_{k = 1}^K \alpha_k \bold{1} \{i \in G_k\},

with \cup_{k = 1}^K G_k = \{1, \dots, N\}, G_k \cap G_j = \emptyset and \| \alpha_k - \alpha_j \| \neq 0 for any k \neq j, k = 1, \dots, K. The total number of groups K is determined by n_groups.

If a panel data set with exogenous regressors is generated (set q = NULL), the explanatory variables are simulated according to

x_{it,j} = 0.2 \gamma_i + e_{it,j}, \quad \gamma_i,e_{it,j} \sim i.i.d. N(0, 1), \quad j = \{1, \dots, p\},

where e_{it,j} denotes a series of innovations. \gamma_i and e_i are independent of each other.

In case alpha_0 = NULL, the group-level slope parameters \alpha_{k} are drawn from \sim U[-2, 2].

If a dynamic panel is specified (dynamic = TRUE), the AR coefficients \beta^{\text{AR}}_i are drawn from a uniform distribution with support (-1, 1) and x_{it,j} = e_{it,j}. Moreover, the individual fixed effects enter the dependent variable via (1 - \beta^{\text{AR}}_i) \gamma_i to account for the autoregressive dependency. We refer to Mehrabani (2023, sec 6) for details.

When specifying an endogenous panel (set q to q \geq p), the e_{it,j} correlate with the cross-sectional innovations u_{it} by a magnitude of 0.5 to produce endogenous regressors (\text{E}(u|X) \neq 0). However, the endogenous regressors can be accounted for by exploiting the q instruments in \bold{Z}, for which \text{E}(u|Z) = 0 holds. The instruments and the first stage coefficients are generated in the same fashion as \bold{X} and \bold{\alpha} when q = NULL.

The function nests, among other, the DGPs employed in the simulation study of Mehrabani (2023, sec. 6).

Value

A list holding

`alpha`	the `K \times p` matrix of group-specific slope parameters. If `dynamic = TRUE`, the first column holds the `AR` coefficient.
`groups`	a vector indicating the group memberships `(g_1, \dots, g_N)`, where `g_i = k` if `i \in` group `k`.
`y`	a `NT \times 1` vector of the dependent variable, with `\bold{y}=(y_1, \dots, y_N)^\prime`, `y_i = (y_{i1}, \dots, y_{iT})^\prime` and the scalar `y_{it}`.
`X`	a `NT \times p` matrix of explanatory variables, with `\bold{X}=(x_1, \dots, x_N)^\prime`, `x_i = (x_{i1}, \dots, x_{iT})^\prime` and the `p \times 1` vector `x_{it}`.
`Z`	a `NT \times q` matrix of instruments , where `q \geq p`, `\bold{Z}=(z_1, \dots, z_N)^\prime`, `z_i = (z_{i1}, \dots, z_{iT})^\prime` and `z_{it}` is a `q \times 1` vector. In case a panel with exogenous regressors is generated (`q = NULL`), `\bold{Z}` equals `NULL`.
`data`	a `NT \times (p + 1)` data.frame of the outcome and the explanatory variables.

Author(s)

Paul Haimerl

References

Mehrabani, A. (2023). Estimation and identification of latent group structures in panel data. Journal of Econometrics, 235(2), 1464-1482. doi:10.1016/j.jeconom.2022.12.002.

Examples

# Simulate DGP 1 from Mehrabani (2023, sec. 6)
alpha_0_DGP1 <- matrix(c(0.4, 1, 1.6, 1.6, 1, 0.4), ncol = 2)
DGP1 <- sim_DGP(
  N = 50, n_periods = 20, p = 2, n_groups = 3,
  group_proportions = c(.4, .3, .3), alpha_0 = alpha_0_DGP1
)

[Package PAGFL version 1.1.1 Index]