sim_DGP {PAGFL}R Documentation

Simulate a Panel With a Group Structure in the Slope Coefficients


Construct a static or dynamic, exogenous or endogenous panel data set subject to a group structure in the slope coefficients with optional AR(1)AR(1) or GARCH(1,1)GARCH(1,1) innovations.


  N = 50,
  n_periods = 40,
  p = 2,
  n_groups = 3,
  group_proportions = NULL,
  error_spec = "iid",
  dynamic = FALSE,
  dyn_panel = lifecycle::deprecated(),
  q = NULL,
  alpha_0 = NULL



the number of cross-sectional units. Default is 50.


the number of simulated time periods TT. Default is 40.


the number of explanatory variables. Default is 2.


the number of groups KK. Default is 3.


a numeric vector of length n_groups indicating size of each group as a fraction of NN. If NULL, all groups are of size N/KN / K. Default is NULL.


options include


for iidiid errors.


for an AR(1)AR(1) error process with an autoregressive coefficient of 0.5.


for a GARCH(1,1)GARCH(1,1) error process with a 0.05 constant, a 0.05 ARCH and a 0.9 GARCH coefficient.

Default is "iid".


Logical. If TRUE, the panel includes one stationary autoregressive lag of yity_{it} as an explanatory variable (see sec. Details for more information on the ARAR coefficient). Default is FALSE.


[Deprecated] deprecated and replaced by dynamic.


the number of exogenous instruments when a panel with endogenous regressors is to be simulated. If panel data set with exogenous regressors is supposed to be generated, pass NULL. Default is NULL.


a K×pK \times p matrix of group-specific coefficients. If dynamic = TRUE, the first column represents the stationary ARAR coefficient. If NULL, the coefficients are drawn randomly (see sec. Details). Default is NULL.


The scalar dependent variable yity_{it} is generated according to the following grouped panel data model

yit=γi+βixit+uit,i={1,,N},t={1,,T}.y_{it} = \gamma_i + \beta_i^\prime x_{it} + u_{it}, \quad i = \{1, \dots, N\}, \quad t = \{1, \dots, T\}.

γi\gamma_i represents individual fixed effects and xitx_{it} a p×1p \times 1 vector of regressors. The individual slope coefficient vectors βi\beta_i are subject to a group structure

βi=k=1Kαk1{iGk},\beta_i = \sum_{k = 1}^K \alpha_k \bold{1} \{i \in G_k\},

with k=1KGk={1,,N}\cup_{k = 1}^K G_k = \{1, \dots, N\}, GkGj=G_k \cap G_j = \emptyset and αkαj0\| \alpha_k - \alpha_j \| \neq 0 for any kjk \neq j, k=1,,Kk = 1, \dots, K. The total number of groups KK is determined by n_groups.

If a panel data set with exogenous regressors is generated (set q = NULL), the explanatory variables are simulated according to

xit,j=0.2γi+eit,j,γi,eit,ji.i.d.N(0,1),j={1,,p},x_{it,j} = 0.2 \gamma_i + e_{it,j}, \quad \gamma_i,e_{it,j} \sim i.i.d. N(0, 1), \quad j = \{1, \dots, p\},

where eit,je_{it,j} denotes a series of innovations. γi\gamma_i and eie_i are independent of each other.

In case alpha_0 = NULL, the group-level slope parameters αk\alpha_{k} are drawn from U[2,2]\sim U[-2, 2].

If a dynamic panel is specified (dynamic = TRUE), the ARAR coefficients βiAR\beta^{\text{AR}}_i are drawn from a uniform distribution with support (1,1)(-1, 1) and xit,j=eit,jx_{it,j} = e_{it,j}. Moreover, the individual fixed effects enter the dependent variable via (1βiAR)γi(1 - \beta^{\text{AR}}_i) \gamma_i to account for the autoregressive dependency. We refer to Mehrabani (2023, sec 6) for details.

When specifying an endogenous panel (set q to qpq \geq p), the eit,je_{it,j} correlate with the cross-sectional innovations uitu_{it} by a magnitude of 0.5 to produce endogenous regressors (E(uX)0\text{E}(u|X) \neq 0). However, the endogenous regressors can be accounted for by exploiting the qq instruments in Z\bold{Z}, for which E(uZ)=0\text{E}(u|Z) = 0 holds. The instruments and the first stage coefficients are generated in the same fashion as X\bold{X} and α\bold{\alpha} when q = NULL.

The function nests, among other, the DGPs employed in the simulation study of Mehrabani (2023, sec. 6).


A list holding


the K×pK \times p matrix of group-specific slope parameters. If dynamic = TRUE, the first column holds the ARAR coefficient.


a vector indicating the group memberships (g1,,gN)(g_1, \dots, g_N), where gi=kg_i = k if ii \in group kk.


a NT×1NT \times 1 vector of the dependent variable, with y=(y1,,yN)\bold{y}=(y_1, \dots, y_N)^\prime, yi=(yi1,,yiT)y_i = (y_{i1}, \dots, y_{iT})^\prime and the scalar yity_{it}.


a NT×pNT \times p matrix of explanatory variables, with X=(x1,,xN)\bold{X}=(x_1, \dots, x_N)^\prime, xi=(xi1,,xiT)x_i = (x_{i1}, \dots, x_{iT})^\prime and the p×1p \times 1 vector xitx_{it}.


a NT×qNT \times q matrix of instruments , where qpq \geq p, Z=(z1,,zN)\bold{Z}=(z_1, \dots, z_N)^\prime, zi=(zi1,,ziT)z_i = (z_{i1}, \dots, z_{iT})^\prime and zitz_{it} is a q×1q \times 1 vector. In case a panel with exogenous regressors is generated (q = NULL), Z\bold{Z} equals NULL.


a NT×(p+1)NT \times (p + 1) data.frame of the outcome and the explanatory variables.


Paul Haimerl


Mehrabani, A. (2023). Estimation and identification of latent group structures in panel data. Journal of Econometrics, 235(2), 1464-1482. doi:10.1016/j.jeconom.2022.12.002.


# Simulate DGP 1 from Mehrabani (2023, sec. 6)
alpha_0_DGP1 <- matrix(c(0.4, 1, 1.6, 1.6, 1, 0.4), ncol = 2)
DGP1 <- sim_DGP(
  N = 50, n_periods = 20, p = 2, n_groups = 3,
  group_proportions = c(.4, .3, .3), alpha_0 = alpha_0_DGP1

[Package PAGFL version 1.1.1 Index]