sim_DGP {PAGFL} | R Documentation |
Simulate a Panel With a Latent Group Structure
Description
Construct a static or dynamic, exogenous or endogenous panel data set subject to a latent group structure with optional AR(1)
or GARCH(1,1)
innovations.
Usage
sim_DGP(
N = 50,
n_periods = 40,
p = 2,
n_groups = 3,
group_proportions = NULL,
error_spec = NULL,
dyn_panel = FALSE,
q = NULL,
alpha_0 = NULL
)
Arguments
N |
the number of cross-sectional units. Default is 50. |
n_periods |
the number of simulated time periods |
p |
the number of explanatory variables. Default is 2. |
n_groups |
the number of latent groups |
group_proportions |
a numeric vector of length |
error_spec |
the error specification used. Options are
Default is |
dyn_panel |
Logical. If |
q |
the number of exogenous instruments when a panel with endogenous regressors is to be simulated. If panel data set with exogenous regressors is supposed to be generated, pass |
alpha_0 |
an optional pre-specified |
Details
The scalar dependent variable y_{it}
is driven by the following panel data model
y_{it} = \eta_i + \beta_i^\prime x_{it} + u_{it}, \quad i = \{1, \dots, N\}, \quad t = \{1, \dots, T\}.
\eta_i
represents individual fixed effects and x_{it} = (x_{it,1}, \dots, x_{it,p})
a p \times 1
vector of regressors.
The individual slope coefficient vectors \beta_i
are subject to a latent group structure \beta_i = \sum_{k = 1}^K \alpha_k \bold{1} \{i \in G_k\}
.
As a consequence, the group-level coefficients \bold{\alpha} = (\alpha^\prime_1, \dots, \alpha^\prime_K)^\prime
follow the partition \bold{G}
of N
cross-sectional units \bold{G} = (G_1, \dots, G_K)
such that \cup_{k=1}^K = \{1,\dots,N\}
and G_k \cap G_l = \emptyset, \; \alpha_k \neq \alpha_l
for any two groups k \neq l
(Mehrabani, 2023, sec. 2.1).
If a panel data set with exogenous regressors is generated (set q = NULL
), the p
predictors are simulated as:
x_{it,j} = 0.2 \eta_i + e_{it,j}, \quad \eta_i,e_{it,j} \sim i.i.d. N(0, 1), \quad j = \{1, \dots, p\},
where e_{it,j}
denotes a series of innovations. \eta_i
and e_i
are independent of each other.
In case alpha_0 = NULL
, the group-level slope parameters \alpha_{k}
are drawn from \sim U[-2, 2]
.
If a dynamic panel is specified (dyn_panel = TRUE
), the AR
coefficients \beta^{\text{AR}}_i
are drawn from a uniform distribution with support (-1, 1)
and x_{it,j} = e_{it,j}
.
The individual fixed effects enter the dependent variable via (1 - \beta^{\text{AR}}_i) \eta_i
to account for the autoregressive dependency.
I refer to Mehrabani (2023, sec 6) for details.
When specifying an endogenous panel (set q
to q \geq p
), e_{it,j}
correlate with the cross-sectional innovations u_{it}
by a magnitude of 0.5 to produce endogenous regressors with \text{E}(u|X) \neq 0
. However, the endogenous regressors can be accounted for by exploiting the q
instruments in \bold{Z}
, for which \text{E}(u|Z) = 0
holds.
The instruments and the first stage coefficients are generated in the same fashion as \bold{X}
and \bold{\alpha}
when q = NULL
, respectively.
The function nests, among other, the DGPs employed in the simulation study of Mehrabani (2023, sec. 6).
Value
A list holding
alpha |
the |
groups |
a vector indicating the group memberships. |
y |
a |
X |
a |
Z |
a |
Author(s)
Paul Haimerl
References
Mehrabani, A. (2023). Estimation and identification of latent group structures in panel data. Journal of Econometrics, 235(2), 1464-1482. doi:10.1016/j.jeconom.2022.12.002.
Examples
# Simulate DGP 1 from Mehrabani (2023, sec. 6)
alpha_0_DGP1 <- matrix(c(0.4, 1, 1.6, 1.6, 1, 0.4), ncol = 2)
DGP1 <- sim_DGP(
N = 50, n_periods = 20, p = 2, n_groups = 3,
group_proportions = c(.4, .3, .3), alpha_0 = alpha_0_DGP1
)
# Simulate DGP 6 from Mehrabani (2023, sec. 6)
alpha_0_DGP6 <- cbind(
c(0.8, 0.6, 0.4, 0.2, -0.2, -0.4, -0.6, -0.8),
c(-4, -3, -2, -1, 1, 2, 3, 4),
c(4, 3, 2, 1, -1, -2, -3, -4)
)