grouped_tv_plm {PAGFL}R Documentation

Grouped Time-varying Panel Data Model

Description

Estimate a grouped time-varying panel data model given an observed group structure. Coefficient functions are homogeneous within groups but heterogeneous across groups. The time-varying coefficients are modeled as polynomial B-splines. The function supports both static and dynamic panel data models.

Usage

grouped_tv_plm(
  formula,
  data,
  groups,
  index = NULL,
  n_periods = NULL,
  d = 3,
  M = floor(length(y)^(1/7) - log(p)),
  const_coef = NULL,
  rho = 0.04 * log(N * n_periods)/sqrt(N * n_periods),
  verbose = TRUE,
  parallel = TRUE,
  ...
)

## S3 method for class 'tv_gplm'
summary(object, ...)

## S3 method for class 'tv_gplm'
formula(x, ...)

## S3 method for class 'tv_gplm'
df.residual(object, ...)

## S3 method for class 'tv_gplm'
print(x, ...)

## S3 method for class 'tv_gplm'
coef(object, ...)

## S3 method for class 'tv_gplm'
residuals(object, ...)

## S3 method for class 'tv_gplm'
fitted(object, ...)

Arguments

formula

a formula object describing the model to be estimated.

data

a data.frame or matrix holding a panel data set. If no index variables are provided, the panel must be balanced and ordered in the long format Y=(Y1,,YN)\bold{Y}=(Y_1^\prime, \dots, Y_N^\prime)^\prime, Yi=(Yi1,,YiT)Y_i = (Y_{i1}, \dots, Y_{iT})^\prime with Yit=(yit,xit)Y_{it} = (y_{it}, x_{it}^\prime)^\prime. Conversely, if data is not ordered or not balanced, data must include two index variables that declare the cross-sectional unit ii and the time period tt of each observation.

groups

a numerical or character vector of length NN that indicates the group membership of each cross-sectional unit ii.

index

a character vector holding two strings. The first string denotes the name of the index variable identifying the cross-sectional unit ii, and the second string represents the name of the variable declaring the time period tt. In case of a balanced panel data set that is ordered in the long format, index can be left empty if the the number of time periods n_periods is supplied.

n_periods

the number of observed time periods TT. If an index character vector is passed, this argument can be left empty. Default is Null.

d

the polynomial degree of the B-splines. Default is 3.

M

the number of interior knots of the B-splines. If left unspecified, the default heuristic M=floor((NT)17log(p))M = \text{floor}((NT)^{\frac{1}{7}} - \log(p)) is used. Note that MM does not include the boundary knots and the entire sequence of knots is of length M+d+1M + d + 1.

const_coef

a character vector containing the variable names of explanatory variables that enter with time-constant coefficients.

rho

the tuning parameter balancing the fitness and penalty terms in the IC. If left unspecified, the heuristic ρ=0.07log(NT)NT\rho = 0.07 \frac{\log(NT)}{\sqrt{NT}} of Mehrabani (2023, sec. 6) is used. We recommend the default.

verbose

logical. If TRUE, helpful warning messages are shown. Default is TRUE.

parallel

logical. If TRUE, certain operations are parallelized across multiple cores. Default is TRUE.

...

ellipsis

object

of class tv_gplm.

x

of class tv_gplm.

Details

Consider the grouped time-varying panel data model

yit=γi+βi(t/T)xit+ϵit,i=1,,N,  t=1,,T,y_{it} = \gamma_i + \beta^\prime_{i} (t/T) x_{it} + \epsilon_{it}, \quad i = 1, \dots, N, \; t = 1, \dots, T,

where yity_{it} is the scalar dependent variable, γi\gamma_i is an individual fixed effect, xitx_{it} is a p×1p \times 1 vector of explanatory variables, and ϵit\epsilon_{it} is a zero mean error. The coefficient vector βi(t/T)\beta_{i} (t/T) is subject to the observed group pattern

βi(tT)=k=1Kαk(tT)1{iGk},\beta_i \left(\frac{t}{T} \right) = \sum_{k = 1}^K \alpha_k \left( \frac{t}{T} \right) \bold{1} \{i \in G_k \},

with k=1KGk={1,,N}\cup_{k = 1}^K G_k = \{1, \dots, N\}, GkGj=G_k \cap G_j = \emptyset and αkαj0\| \alpha_k - \alpha_j \| \neq 0 for any kjk \neq j, k=1,,Kk = 1, \dots, K.

αk(t/T)\alpha_k (t/T) and, in turn, βi(t/T)\beta_i (t/T) is estimated as polynomial B-splines using the penalized sieve-technique. To this end, let B(v)B(v) denote a M+d+1M + d +1 vector of polynomial spline basis functions, where dd represents the polynomial degree and MM gives the number of interior knots of the B-spline. αk(t/T)\alpha_{k}(t/T) is approximated by forming a linear combination of the basis functions αk(t/T)ξkB(t/T)\alpha_{k}(t/T) \approx \xi_k^\prime B(t/T), where ξk\xi_k is a (M+d+1)×p(M + d + 1) \times p coefficient matrix.

The explanatory variables are projected onto the spline basis system, which results in the (M+d+1)p×1(M + d + 1)p \times 1 vector zit=xitB(v)z_{it} = x_{it} \otimes B(v). Subsequently, the DGP can be reformulated as

yit=γi+zitvec(πi)+uit,y_{it} = \gamma_i + z_{it}^\prime \text{vec}(\pi_{i}) + u_{it},

where πi=ξk\pi_i = \xi_k if iGki \in G_k, uit=ϵit+ηitu_{it} = \epsilon_{it} + \eta_{it}, and ηit\eta_{it} reflects a sieve approximation error. We refer to Su et al. (2019, sec. 2) for more details on the sieve technique.

Finally, α^k(t/T)\hat{\alpha}_{k}(t/T) is obtained as α^k(t/T)=ξ^kB(t/T)\hat{\alpha}_{k}(t/T) = \hat{\xi}_k^\prime B(t/T), where the vector of control points ξk\xi_k is estimated using OLS

ξ^k=(iGkt=1Tz~itz~it)1iGkt=1Tz~ity~it,\hat{\xi}_k = \left( \sum_{i \in G_k} \sum_{t = 1}^T \tilde{z}_{it} \tilde{z}_{it}^\prime \right)^{-1} \sum_{i \in G_k} \sum_{t = 1}^T \tilde{z}_{it} \tilde{y}_{it},

and a~it=aitT1t=1Tait\tilde{a}_{it} = a_{it} - T^{-1} \sum_{t = 1}^T a_{it}, a={y,z}a = \{y, z\} to concentrate out the fixed effect γi\gamma_i (within-transformation).

In case of an unbalanced panel data set, the earliest and latest available observations per group define the start and end-points of the interval on which the group-specific time-varying coefficients are defined.

Value

An object of class tv_gplm holding

model

a data.frame containing the dependent and explanatory variables as well as cross-sectional and time indices,

coefficients

let p(1)p^{(1)} denote the number of time-varying and p(2)p^{(2)} the number of time constant coefficients. A list holding (i) a T×p(1)×KT \times p^{(1)} \times K array of the group-specific functional coefficients and (ii) a K×p(2)K \times p^{(2)} matrix of time-constant estimates.

groups

a list containing (i) the total number of groups KK and (ii) a vector of group memberships (g^1,,g^N)(\hat{g}_1, \dots, \hat{g}_N), where g^i=k\hat{g}_i = k if ii is part of group kk,

residuals

a vector of residuals of the demeaned model,

fitted

a vector of fitted values of the demeaned model,

args

a list of additional arguments,

IC

a list containing (i) the value of the IC and (ii) the MSE,

call

the function call.

An object of class tv_gplm has print, summary, fitted, residuals, formula, df.residual and coef S3 methods.

Author(s)

Paul Haimerl

References

Su, L., Wang, X., & Jin, S. (2019). Sieve estimation of time-varying panel data models with latent structures. Journal of Business & Economic Statistics, 37(2), 334-349. doi:10.1080/07350015.2017.1340299.

Examples

# Simulate a time-varying panel with a trend and a group pattern
set.seed(1)
sim <- sim_tv_DGP(N = 10, n_periods = 50, intercept = TRUE, p = 2)
df <- data.frame(y = c(sim$y))
groups <- sim$groups

# Estimate the time-varying grouped panel data model
estim <- grouped_tv_plm(y ~ ., data = df, n_periods = 50, groups = groups)
summary(estim)


[Package PAGFL version 1.1.1 Index]