R: Apply the Pairwise Adaptive Group Fused Lasso

PAGFL {PAGFL}

R Documentation

Apply the Pairwise Adaptive Group Fused Lasso

Description

The pairwise adaptive group fused lasso (PAGFL) by Mehrabani (2023) jointly estimates the latent group structure and group-specific slope parameters in a panel data model. It can handle static and dynamic panels, either with or without endogenous regressors.

Usage

PAGFL(
  y,
  X,
  n_periods,
  lambda,
  method = "PLS",
  Z = NULL,
  min_group_frac = 0.05,
  bias_correc = FALSE,
  kappa = 2,
  max_iter = 2000,
  tol_convergence = 0.001,
  tol_group = sqrt(p/(sqrt(N * n_periods) * log(log(N * n_periods)))),
  rho = 0.07 * log(N * n_periods)/sqrt(N * n_periods),
  varrho = max(sqrt(5 * N * n_periods * p)/log(N * n_periods * p) - 7, 1),
  verbose = TRUE
)

Arguments

`y`	a `NT \times 1` vector or data.frame of the dependent variable, with `\bold{y}=(y_1, \dots, y_N)^\prime`, `y_i = (y_{i1}, \dots, y_{iT})^\prime` and the scalar `y_{it}`.
`X`	a `NT \times p` matrix or data.frame of explanatory variables, with `\bold{X}=(x_1, \dots, x_N)^\prime`, `x_i = (x_{i1}, \dots, x_{iT})^\prime` and the `p \times 1` vector `x_{it}`.
`n_periods`	the number of observed time periods `T`.
`lambda`	the tuning parameter governing the strength of the penalty term. Either a single `\lambda` or a vector of candidate values can be passed. If a vector is supplied, a BIC-type information criterion selects the best fitting parameter value.
`method`	the estimation method. Options are `'PLS'` for using the penalized least squares (PLS) algorithm. We recommend PLS in case of (weakly) exogenous regressors (Mehrabani, 2023, sec. 2.2). `'PGMM'` for using the penalized Generalized Method of Moments (PGMM). PGMM is required when instrumenting endogenous regressors (Mehrabani, 2023, sec. 2.3). A matrix `Z` contains the necessary exogenous instruments. Default is `'PLS'`.
`Z`	a `NT \times q` matrix of exogenous instruments, where `q \geq p`, `\bold{Z}=(z_1, \dots, z_N)^\prime`, `z_i = (z_{i1}, \dots, z_{iT})^\prime` and `z_{it}` is a `q \times 1` vector. `Z` is only required when `method = 'PGMM'` is selected. When using `'PLS'`, either pass `NULL` or any matrix `\bold{Z}` is disregarded. Default is `NULL`.
`min_group_frac`	the minimum group size as a fraction of the total number of individuals `N`. In case a group falls short of this threshold, a hierarchical classifier allocates its members to the remaining groups. Default is 0.05.
`bias_correc`	logical. If `TRUE`, a Split-panel Jackknife bias correction following Dhaene and Jochmans (2015) is applied to the slope parameters. We recommend using this correction when facing a dynamic panel. Default is `FALSE`.
`kappa`	the weight placed on the adaptive penalty weights. Default is 2.
`max_iter`	the maximum number of iterations for the ADMM estimation algorithm. Default is 2000.
`tol_convergence`	the tolerance limit for the stopping criterion of the iterative ADMM estimation algorithm. Default is 0.001.
`tol_group`	the tolerance limit for within-group differences. Two individuals are placed in the same group if the Frobenius norm of their coefficient parameter difference is below this parameter. If left unspecified, the heuristic `\sqrt{\frac{p}{\sqrt{NT} \log(\log(NT))}}` is used. We recommend the default.
`rho`	the tuning parameter balancing the fitness and penalty terms in the information criterion that determines the penalty parameter `\lambda`. If left unspecified, the heuristic `\rho = 0.07 \frac{\sqrt{NT} \log(NT)}{NT}` of Mehrabani (2023, sec. 6) is used. We recommend the default.
`varrho`	the non-negative Lagrangian ADMM penalty parameter. For PLS, the `\varrho` value is trivial. However, for PGMM, small values lead to slow convergence of the algorithm. If left unspecified, the default heuristic `\varrho = \max(\frac{\sqrt{5NTp}}{\log(NTp)}-7, 1`) is used.
`verbose`	logical. If `TRUE`, a progression bar is printed when iterating over candidate `\lambda` values and helpful warning messages are shown. Default is `TRUE`.

Details

The PLS method minimizes the following criterion:

\frac{1}{T} \sum^N_{i=1} \sum^{T}_{t=1}(\tilde{y}_{it} - \beta^\prime_i \tilde{x}_{it})^2 + \frac{\lambda}{N} \sum_{1 \leq i} \sum_{i<j \leq N} \dot{w}_{ij} \| \beta_i - \beta_j \|,

where \tilde{y}_{it} is the de-meaned dependent variable, \tilde{x}_{it} represents a vector of de-meaned weakly exogenous explanatory variables, \lambda is the penalty tuning parameter and \dot{w}_{ij} reflects adaptive penalty weights (see Mehrabani, 2023, eq. 2.6). \| \cdot \| denotes the Frobenius norm. The adaptive weights \dot{w}_{ij} are obtained by a preliminary least squares estimation. The solution \hat{\beta} is computed via an iterative alternating direction method of multipliers (ADMM) algorithm (see Mehrabani, 2023, sec. 5.1).

PGMM employs a set of instruments Z to control for endogenous regressors. Using PGMM, \bold{\beta} = (\beta_1^\prime, \dots, \beta_N^\prime)^\prime is estimated by minimizing:

\sum^N_{i = 1} \left[ \frac{1}{N} \sum_{t=1}^T z_{it} (\Delta y_{it} - \beta^\prime_i \Delta x_{it}) \right]^\prime W_i \left[\frac{1}{T} \sum_{t=1}^T z_{it}(\Delta y_{it} - \beta^\prime_i \Delta x_{it}) \right] + \frac{\lambda}{N} \sum_{1 \leq i} \sum_{i<j \leq N} \ddot{w}_{ij} \| \beta_i - \beta_j \|.

\ddot{w}_{ij} are obtained by an initial GMM estimation. \Delta gives the first differences operator \Delta y_{it} = y_{it} - y_{i t-1}. W_i represents a data-driven q \times q weight matrix. I refer to Mehrabani (2023, eq. 2.10) for more details. \bold{\beta} is again estimated employing an efficient ADMM algorithm (Mehrabani, 2023, sec. 5.2).

Two individuals are assigned to the same group if \| \hat{\beta}_i - \hat{\beta}_j \| \leq \epsilon_{\text{tol}}, where \epsilon_{\text{tol}} is given by tol_group.

We suggest identifying a suitable \lambda parameter by passing a logarithmically spaced grid of candidate values with a lower limit of 0 and an upper limit that leads to a fully homogenous panel. A BIC-type information criterion then selects the best fitting \lambda value.

Value

A list holding

`IC`	the BIC-type information criterion.
`lambda`	the penalization parameter. If multiple `\lambda` values were passed, the parameter yielding the lowest IC.
`alpha_hat`	a `K \times p` matrix of the post-Lasso group-specific parameter estimates.
`K_hat`	the estimated total number of groups.
`groups_hat`	a vector of estimated group memberships.
`iter`	the number of executed algorithm iterations.
`convergence`	logical. If `TRUE`, convergence was achieved. If `FALSE`, `max_iter` was reached.

Author(s)

Paul Haimerl

References

Dhaene, G., & Jochmans, K. (2015). Split-panel jackknife estimation of fixed-effect models. The Review of Economic Studies, 82(3), 991-1030. doi:10.1093/restud/rdv007.

Mehrabani, A. (2023). Estimation and identification of latent group structures in panel data. Journal of Econometrics, 235(2), 1464-1482. doi:10.1016/j.jeconom.2022.12.002.

Examples

# Simulate a panel with a group structure
sim <- sim_DGP(N = 50, n_periods = 80, p = 2, n_groups = 3)
y <- sim$y
X <- sim$X

# Run the PAGFL procedure for a set of candidate tuning parameter values
lambda_set <- exp(log(10) * seq(log10(1e-4), log10(10), length.out = 10))
estim <- PAGFL(y = y, X = X, n_periods = 80, lambda = lambda_set, method = 'PLS')
print(estim)

[Package PAGFL version 1.0.1 Index]