sim_DGP {PAGFL}R Documentation

Simulate a Panel With a Group Structure in the Slope Coefficients

Description

Construct a static or dynamic, exogenous or endogenous panel data set subject to a group structure in the slope coefficients with optional AR(1)AR(1) or GARCH(1,1)GARCH(1,1) innovations.

Usage

sim_DGP(
  N = 50,
  n_periods = 40,
  p = 2,
  n_groups = 3,
  group_proportions = NULL,
  error_spec = "iid",
  dynamic = FALSE,
  dyn_panel = lifecycle::deprecated(),
  q = NULL,
  alpha_0 = NULL
)

Arguments

N

the number of cross-sectional units. Default is 50.

n_periods

the number of simulated time periods TT. Default is 40.

p

the number of explanatory variables. Default is 2.

n_groups

the number of groups KK. Default is 3.

group_proportions

a numeric vector of length n_groups indicating size of each group as a fraction of NN. If NULL, all groups are of size N/KN / K. Default is NULL.

error_spec

options include

"iid"

for iidiid errors.

"AR"

for an AR(1)AR(1) error process with an autoregressive coefficient of 0.5.

"GARCH"

for a GARCH(1,1)GARCH(1,1) error process with a 0.05 constant, a 0.05 ARCH and a 0.9 GARCH coefficient.

Default is "iid".

dynamic

Logical. If TRUE, the panel includes one stationary autoregressive lag of yity_{it} as an explanatory variable (see sec. Details for more information on the ARAR coefficient). Default is FALSE.

dyn_panel

[Deprecated] deprecated and replaced by dynamic.

q

the number of exogenous instruments when a panel with endogenous regressors is to be simulated. If panel data set with exogenous regressors is supposed to be generated, pass NULL. Default is NULL.

alpha_0

a K×pK \times p matrix of group-specific coefficients. If dynamic = TRUE, the first column represents the stationary ARAR coefficient. If NULL, the coefficients are drawn randomly (see sec. Details). Default is NULL.

Details

The scalar dependent variable yity_{it} is generated according to the following grouped panel data model

yit=γi+βixit+uit,i={1,,N},t={1,,T}.y_{it} = \gamma_i + \beta_i^\prime x_{it} + u_{it}, \quad i = \{1, \dots, N\}, \quad t = \{1, \dots, T\}.

γi\gamma_i represents individual fixed effects and xitx_{it} a p×1p \times 1 vector of regressors. The individual slope coefficient vectors βi\beta_i are subject to a group structure

βi=k=1Kαk1{iGk},\beta_i = \sum_{k = 1}^K \alpha_k \bold{1} \{i \in G_k\},

with k=1KGk={1,,N}\cup_{k = 1}^K G_k = \{1, \dots, N\}, GkGj=G_k \cap G_j = \emptyset and αkαj0\| \alpha_k - \alpha_j \| \neq 0 for any kjk \neq j, k=1,,Kk = 1, \dots, K. The total number of groups KK is determined by n_groups.

If a panel data set with exogenous regressors is generated (set q = NULL), the explanatory variables are simulated according to

xit,j=0.2γi+eit,j,γi,eit,ji.i.d.N(0,1),j={1,,p},x_{it,j} = 0.2 \gamma_i + e_{it,j}, \quad \gamma_i,e_{it,j} \sim i.i.d. N(0, 1), \quad j = \{1, \dots, p\},

where eit,je_{it,j} denotes a series of innovations. γi\gamma_i and eie_i are independent of each other.

In case alpha_0 = NULL, the group-level slope parameters αk\alpha_{k} are drawn from U[2,2]\sim U[-2, 2].

If a dynamic panel is specified (dynamic = TRUE), the ARAR coefficients βiAR\beta^{\text{AR}}_i are drawn from a uniform distribution with support (1,1)(-1, 1) and xit,j=eit,jx_{it,j} = e_{it,j}. Moreover, the individual fixed effects enter the dependent variable via (1βiAR)γi(1 - \beta^{\text{AR}}_i) \gamma_i to account for the autoregressive dependency. We refer to Mehrabani (2023, sec 6) for details.

When specifying an endogenous panel (set q to qpq \geq p), the eit,je_{it,j} correlate with the cross-sectional innovations uitu_{it} by a magnitude of 0.5 to produce endogenous regressors (E(uX)0\text{E}(u|X) \neq 0). However, the endogenous regressors can be accounted for by exploiting the qq instruments in Z\bold{Z}, for which E(uZ)=0\text{E}(u|Z) = 0 holds. The instruments and the first stage coefficients are generated in the same fashion as X\bold{X} and α\bold{\alpha} when q = NULL.

The function nests, among other, the DGPs employed in the simulation study of Mehrabani (2023, sec. 6).

Value

A list holding

alpha

the K×pK \times p matrix of group-specific slope parameters. If dynamic = TRUE, the first column holds the ARAR coefficient.

groups

a vector indicating the group memberships (g1,,gN)(g_1, \dots, g_N), where gi=kg_i = k if ii \in group kk.

y

a NT×1NT \times 1 vector of the dependent variable, with y=(y1,,yN)\bold{y}=(y_1, \dots, y_N)^\prime, yi=(yi1,,yiT)y_i = (y_{i1}, \dots, y_{iT})^\prime and the scalar yity_{it}.

X

a NT×pNT \times p matrix of explanatory variables, with X=(x1,,xN)\bold{X}=(x_1, \dots, x_N)^\prime, xi=(xi1,,xiT)x_i = (x_{i1}, \dots, x_{iT})^\prime and the p×1p \times 1 vector xitx_{it}.

Z

a NT×qNT \times q matrix of instruments , where qpq \geq p, Z=(z1,,zN)\bold{Z}=(z_1, \dots, z_N)^\prime, zi=(zi1,,ziT)z_i = (z_{i1}, \dots, z_{iT})^\prime and zitz_{it} is a q×1q \times 1 vector. In case a panel with exogenous regressors is generated (q = NULL), Z\bold{Z} equals NULL.

data

a NT×(p+1)NT \times (p + 1) data.frame of the outcome and the explanatory variables.

Author(s)

Paul Haimerl

References

Mehrabani, A. (2023). Estimation and identification of latent group structures in panel data. Journal of Econometrics, 235(2), 1464-1482. doi:10.1016/j.jeconom.2022.12.002.

Examples

# Simulate DGP 1 from Mehrabani (2023, sec. 6)
alpha_0_DGP1 <- matrix(c(0.4, 1, 1.6, 1.6, 1, 0.4), ncol = 2)
DGP1 <- sim_DGP(
  N = 50, n_periods = 20, p = 2, n_groups = 3,
  group_proportions = c(.4, .3, .3), alpha_0 = alpha_0_DGP1
)

[Package PAGFL version 1.1.1 Index]