R: Grouped Time-varying Panel Data Model

grouped_tv_plm {PAGFL}

R Documentation

Grouped Time-varying Panel Data Model

Description

Estimate a grouped time-varying panel data model given an observed group structure. Coefficient functions are homogeneous within groups but heterogeneous across groups. The time-varying coefficients are modeled as polynomial B-splines. The function supports both static and dynamic panel data models.

Usage

grouped_tv_plm(
  formula,
  data,
  groups,
  index = NULL,
  n_periods = NULL,
  d = 3,
  M = floor(length(y)^(1/7) - log(p)),
  const_coef = NULL,
  rho = 0.04 * log(N * n_periods)/sqrt(N * n_periods),
  verbose = TRUE,
  parallel = TRUE,
  ...
)

## S3 method for class 'tv_gplm'
summary(object, ...)

## S3 method for class 'tv_gplm'
formula(x, ...)

## S3 method for class 'tv_gplm'
df.residual(object, ...)

## S3 method for class 'tv_gplm'
print(x, ...)

## S3 method for class 'tv_gplm'
coef(object, ...)

## S3 method for class 'tv_gplm'
residuals(object, ...)

## S3 method for class 'tv_gplm'
fitted(object, ...)

Arguments

`formula`	a formula object describing the model to be estimated.
`data`	a `data.frame` or `matrix` holding a panel data set. If no `index` variables are provided, the panel must be balanced and ordered in the long format `\bold{Y}=(Y_1^\prime, \dots, Y_N^\prime)^\prime`, `Y_i = (Y_{i1}, \dots, Y_{iT})^\prime` with `Y_{it} = (y_{it}, x_{it}^\prime)^\prime`. Conversely, if `data` is not ordered or not balanced, `data` must include two index variables that declare the cross-sectional unit `i` and the time period `t` of each observation.
`groups`	a numerical or character vector of length `N` that indicates the group membership of each cross-sectional unit `i`.
`index`	a character vector holding two strings. The first string denotes the name of the index variable identifying the cross-sectional unit `i`, and the second string represents the name of the variable declaring the time period `t`. In case of a balanced panel data set that is ordered in the long format, `index` can be left empty if the the number of time periods `n_periods` is supplied.
`n_periods`	the number of observed time periods `T`. If an `index` character vector is passed, this argument can be left empty. Default is `Null`.
`d`	the polynomial degree of the B-splines. Default is 3.
`M`	the number of interior knots of the B-splines. If left unspecified, the default heuristic `M = \text{floor}((NT)^{\frac{1}{7}} - \log(p))` is used. Note that `M` does not include the boundary knots and the entire sequence of knots is of length `M + d + 1`.
`const_coef`	a character vector containing the variable names of explanatory variables that enter with time-constant coefficients.
`rho`	the tuning parameter balancing the fitness and penalty terms in the IC. If left unspecified, the heuristic `\rho = 0.07 \frac{\log(NT)}{\sqrt{NT}}` of Mehrabani (2023, sec. 6) is used. We recommend the default.
`verbose`	logical. If `TRUE`, helpful warning messages are shown. Default is `TRUE`.
`parallel`	logical. If `TRUE`, certain operations are parallelized across multiple cores. Default is `TRUE`.
`...`	ellipsis
`object`	of class `tv_gplm`.
`x`	of class `tv_gplm`.

Details

Consider the grouped time-varying panel data model

y_{it} = \gamma_i + \beta^\prime_{i} (t/T) x_{it} + \epsilon_{it}, \quad i = 1, \dots, N, \; t = 1, \dots, T,

where y_{it} is the scalar dependent variable, \gamma_i is an individual fixed effect, x_{it} is a p \times 1 vector of explanatory variables, and \epsilon_{it} is a zero mean error. The coefficient vector \beta_{i} (t/T) is subject to the observed group pattern

\beta_i \left(\frac{t}{T} \right) = \sum_{k = 1}^K \alpha_k \left( \frac{t}{T} \right) \bold{1} \{i \in G_k \},

with \cup_{k = 1}^K G_k = \{1, \dots, N\}, G_k \cap G_j = \emptyset and \| \alpha_k - \alpha_j \| \neq 0 for any k \neq j, k = 1, \dots, K.

\alpha_k (t/T) and, in turn, \beta_i (t/T) is estimated as polynomial B-splines using the penalized sieve-technique. To this end, let B(v) denote a M + d +1 vector of polynomial spline basis functions, where d represents the polynomial degree and M gives the number of interior knots of the B-spline. \alpha_{k}(t/T) is approximated by forming a linear combination of the basis functions \alpha_{k}(t/T) \approx \xi_k^\prime B(t/T), where \xi_k is a (M + d + 1) \times p coefficient matrix.

The explanatory variables are projected onto the spline basis system, which results in the (M + d + 1)p \times 1 vector z_{it} = x_{it} \otimes B(v). Subsequently, the DGP can be reformulated as

y_{it} = \gamma_i + z_{it}^\prime \text{vec}(\pi_{i}) + u_{it},

where \pi_i = \xi_k if i \in G_k, u_{it} = \epsilon_{it} + \eta_{it}, and \eta_{it} reflects a sieve approximation error. We refer to Su et al. (2019, sec. 2) for more details on the sieve technique.

Finally, \hat{\alpha}_{k}(t/T) is obtained as \hat{\alpha}_{k}(t/T) = \hat{\xi}_k^\prime B(t/T), where the vector of control points \xi_k is estimated using OLS

\hat{\xi}_k = \left( \sum_{i \in G_k} \sum_{t = 1}^T \tilde{z}_{it} \tilde{z}_{it}^\prime \right)^{-1} \sum_{i \in G_k} \sum_{t = 1}^T \tilde{z}_{it} \tilde{y}_{it},

and \tilde{a}_{it} = a_{it} - T^{-1} \sum_{t = 1}^T a_{it}, a = \{y, z\} to concentrate out the fixed effect \gamma_i (within-transformation).

In case of an unbalanced panel data set, the earliest and latest available observations per group define the start and end-points of the interval on which the group-specific time-varying coefficients are defined.

Value

An object of class tv_gplm holding

`model`	a `data.frame` containing the dependent and explanatory variables as well as cross-sectional and time indices,
`coefficients`	let `p^{(1)}` denote the number of time-varying and `p^{(2)}` the number of time constant coefficients. A `list` holding (i) a `T \times p^{(1)} \times K` array of the group-specific functional coefficients and (ii) a `K \times p^{(2)}` matrix of time-constant estimates.
`groups`	a `list` containing (i) the total number of groups `K` and (ii) a vector of group memberships `(\hat{g}_1, \dots, \hat{g}_N)`, where `\hat{g}_i = k` if `i` is part of group `k`,
`residuals`	a vector of residuals of the demeaned model,
`fitted`	a vector of fitted values of the demeaned model,
`args`	a `list` of additional arguments,
`IC`	a `list` containing (i) the value of the IC and (ii) the MSE,
`call`	the function call.

An object of class tv_gplm has print, summary, fitted, residuals, formula, df.residual and coef S3 methods.

Author(s)

Paul Haimerl

References

Su, L., Wang, X., & Jin, S. (2019). Sieve estimation of time-varying panel data models with latent structures. Journal of Business & Economic Statistics, 37(2), 334-349. doi:10.1080/07350015.2017.1340299.

Examples

# Simulate a time-varying panel with a trend and a group pattern
set.seed(1)
sim <- sim_tv_DGP(N = 10, n_periods = 50, intercept = TRUE, p = 2)
df <- data.frame(y = c(sim$y))
groups <- sim$groups

# Estimate the time-varying grouped panel data model
estim <- grouped_tv_plm(y ~ ., data = df, n_periods = 50, groups = groups)
summary(estim)

[Package PAGFL version 1.1.1 Index]