sim_DGP {PAGFL}R Documentation

Simulate a Panel With a Latent Group Structure

Description

Construct a static or dynamic, exogenous or endogenous panel data set subject to a latent group structure with optional AR(1) or GARCH(1,1) innovations.

Usage

sim_DGP(
  N = 50,
  n_periods = 40,
  p = 2,
  n_groups = 3,
  group_proportions = NULL,
  error_spec = NULL,
  dyn_panel = FALSE,
  q = NULL,
  alpha_0 = NULL
)

Arguments

N

the number of cross-sectional units. Default is 50.

n_periods

the number of simulated time periods T. Default is 40.

p

the number of explanatory variables. Default is 2.

n_groups

the number of latent groups K. Default is 3.

group_proportions

a numeric vector of length n_groups indicating the fraction of N each group will contain. If NULL, all groups are of size \frac{N}{K}. Default is NULL.

error_spec

the error specification used. Options are

NULL

for iid errors.

'AR'

for an AR(1) error process with an autoregressive coefficient of 0.5.

'GARCH'

for a GARCH(1,1) error process with a 0.05 constant, a 0.05 ARCH and a 0.9 GARCH coefficient.

Default is NULL.

dyn_panel

Logical. If TRUE, the panel includes one stationary autoregressive lag of the dependent variable (see sec. Details for information on the AR coefficient). Default is FALSE.

q

the number of exogenous instruments when a panel with endogenous regressors is to be simulated. If panel data set with exogenous regressors is supposed to be generated, pass NULL. Default is NULL.

alpha_0

an optional pre-specified K \times p parameter matrix. If NULL, the coefficients are drawn randomly (see sec. Details). If dyn_panel = TRUE, the first column represents the stationary AR coefficient. Default is NULL.

Details

The scalar dependent variable y_{it} is driven by the following panel data model

y_{it} = \eta_i + \beta_i^\prime x_{it} + u_{it}, \quad i = \{1, \dots, N\}, \quad t = \{1, \dots, T\}.

\eta_i represents individual fixed effects and x_{it} = (x_{it,1}, \dots, x_{it,p}) a p \times 1 vector of regressors. The individual slope coefficient vectors \beta_i are subject to a latent group structure \beta_i = \sum_{k = 1}^K \alpha_k \bold{1} \{i \in G_k\}. As a consequence, the group-level coefficients \bold{\alpha} = (\alpha^\prime_1, \dots, \alpha^\prime_K)^\prime follow the partition \bold{G} of N cross-sectional units \bold{G} = (G_1, \dots, G_K) such that \cup_{k=1}^K = \{1,\dots,N\} and G_k \cap G_l = \emptyset, \; \alpha_k \neq \alpha_l for any two groups k \neq l (Mehrabani, 2023, sec. 2.1).

If a panel data set with exogenous regressors is generated (set q = NULL), the p predictors are simulated as:

x_{it,j} = 0.2 \eta_i + e_{it,j}, \quad \eta_i,e_{it,j} \sim i.i.d. N(0, 1), \quad j = \{1, \dots, p\},

where e_{it,j} denotes a series of innovations. \eta_i and e_i are independent of each other.

In case alpha_0 = NULL, the group-level slope parameters \alpha_{k} are drawn from \sim U[-2, 2].

If a dynamic panel is specified (dyn_panel = TRUE), the AR coefficients \beta^{\text{AR}}_i are drawn from a uniform distribution with support (-1, 1) and x_{it,j} = e_{it,j}. The individual fixed effects enter the dependent variable via (1 - \beta^{\text{AR}}_i) \eta_i to account for the autoregressive dependency. I refer to Mehrabani (2023, sec 6) for details.

When specifying an endogenous panel (set q to q \geq p), e_{it,j} correlate with the cross-sectional innovations u_{it} by a magnitude of 0.5 to produce endogenous regressors with \text{E}(u|X) \neq 0. However, the endogenous regressors can be accounted for by exploiting the q instruments in \bold{Z}, for which \text{E}(u|Z) = 0 holds. The instruments and the first stage coefficients are generated in the same fashion as \bold{X} and \bold{\alpha} when q = NULL, respectively.

The function nests, among other, the DGPs employed in the simulation study of Mehrabani (2023, sec. 6).

Value

A list holding

alpha

the K \times p matrix of group-specific slope parameters. In case of dyn_panel = TRUE, the first column holds the AR coefficient.

groups

a vector indicating the group memberships.

y

a NT \times 1 vector of the dependent variable, with \bold{y}=(y_1, \dots, y_N)^\prime, y_i = (y_{i1}, \dots, y_{iT})^\prime and the scalar y_{it}.

X

a NT \times p matrix of explanatory variables, with \bold{X}=(x_1, \dots, x_N)^\prime, x_i = (x_{i1}, \dots, x_{iT})^\prime and the p \times 1 vector x_{it}.

Z

a NT \times q matrix of instruments , where q \geq p, \bold{Z}=(z_1, \dots, z_N)^\prime, z_i = (z_{i1}, \dots, z_{iT})^\prime and z_{it} is a q \times 1 vector. In case a panel with exogenous regressors is generated (q = NULL), \bold{Z} equals NULL.

Author(s)

Paul Haimerl

References

Mehrabani, A. (2023). Estimation and identification of latent group structures in panel data. Journal of Econometrics, 235(2), 1464-1482. doi:10.1016/j.jeconom.2022.12.002.

Examples

# Simulate DGP 1 from Mehrabani (2023, sec. 6)
alpha_0_DGP1 <- matrix(c(0.4, 1, 1.6, 1.6, 1, 0.4), ncol = 2)
DGP1 <- sim_DGP(
  N = 50, n_periods = 20, p = 2, n_groups = 3,
  group_proportions = c(.4, .3, .3), alpha_0 = alpha_0_DGP1
)

# Simulate DGP 6 from Mehrabani (2023, sec. 6)
alpha_0_DGP6 <- cbind(
  c(0.8, 0.6, 0.4, 0.2, -0.2, -0.4, -0.6, -0.8),
  c(-4, -3, -2, -1, 1, 2, 3, 4),
  c(4, 3, 2, 1, -1, -2, -3, -4)
)

[Package PAGFL version 1.0.1 Index]