R: Estimate a Sparse Dynamic Factor Model

sparseDFM {sparseDFM}

R Documentation

Estimate a Sparse Dynamic Factor Model

Description

Main function to allow estimation of a DFM or a sparse DFM (with sparse loadings) on stationary data that may have arbitrary patterns of missing data. We allow the user:

an option for estimation method - "PCA", "2Stage", "EM" or "EM-sparse"
an option for IID or AR1 idiosyncratic errors
an option for Kalman Filter/Smoother estimation using standard multivariate equations or fast univariate filtering equations

Usage

sparseDFM(
  X,
  r,
  q = 0,
  alphas = logspace(-2, 3, 100),
  alg = "EM-sparse",
  err = "IID",
  kalman = "univariate",
  store.parameters = FALSE,
  standardize = TRUE,
  max_iter = 100,
  threshold = 1e-04
)

Arguments

X

n x p numeric data matrix or data frame of (stationary) time series.

r

Integer. Number of factors.

q

Integer. The first q series (columns of X) should not be made sparse. Default q = 0.

alphas

Numeric vector or value of LASSO regularisation parameters. Default is alphas = logspace(-2,3,100).

alg

Character. Option for estimation algorithm. Default is "EM-sparse". Options are:

`"PCA"`		principle components analysis (PCA) for static factors seen in Stock and Watson (2002).

`"2Stage"`		the two-stage framework of PCA plus Kalman filter/smoother seen in Giannone et al. (2008) and Doz et al. (2011).

`"EM"`		the quasi-maximum likelihood approach using the EM algorithm to handle arbitrary patterns of missing data seen in Banbura and Modugno (2014).

`"EM-sparse"`		the novel sparse EM approach allowing LASSO regularisation on factor loadings seen in (cite our paper).

err

Character. Option for idiosyncratic errors. Default is "IID". Options are:

`"IID"`		errors are IID white noise.

`"AR1"`		errors follow an AR(1) process.

kalman

Character. Option for Kalman filter and smoother equations. Default is "univariate". Options are:

`"multivariate"`		classic Kalman filter and smoother equations seen in Shumway and Stoffer (1982).

`"univaraite"`		univariate treatment (sequential processing) of the multivariate equations for fast Kalman filter and smoother seen in Koopman and Durbin (2000).

store.parameters

Logical. Store outputs for every alpha L1 penalty parameter. Default is FALSE.

standardize

Logical. Standardize the data before estimating the model. Default is TRUE.

max_iter

Integer. Maximum number of EM iterations. Default is 100.

threshold

Numeric. Tolerance on EM iterates. Default is 1e-4.

Details

For full details of the model please refer to Mosley et al. (2023).

Value

A list-of-lists-like S3 object of class 'sparseDFM' with the following elements:

data

A list containing information about the data with the following elements:

`X`		is the original `n \times p` numeric data matrix of (stationary) time series.

`standardize`		is a logical value indicating whether the original data was standardized.

`X.mean`		is a p-dimensional numeric vector of column means of `X`.

`X.sd`		is a p-dimensional numeric vector of column standard deviations of `X`.

`X.bal`		is a `n \times p` numeric data matrix of the original `X` with missing data interpolated using `fillNA()`.

`eigen`		is the eigen decomposition of `X.bal`.

`fitted`		is the `n \times p` predicted data matrix using the estimated parameters: `\hat{\Lambda}\hat{F}`.

`fitted.unscaled`		is the `n \times p` predicted data matrix using the estimated parameters: `\hat{\Lambda}\hat{F}` that has been unscaled back to original data scale if `standardize` is `TRUE`.

`method`		the estimation algorithm used (`alg`).

`err`		the type of idiosyncratic errors assumed. Either `IID` or `AR1`.

`call`		call object obtained from `match.call()`.

params

A list containing the estimated parameters of the model with the following elements:

`A`		the `r \times r` factor transition matrix.

`Phi`		the p-dimensional vector of AR(1) coefficients for the idiosyncratic errors.

`Lambda`		the `p \times r` loadings matrix.

`Sigma_u`		the `r \times r` factor transition error covariance matrix.

`Sigma_epsilon`		the p-dimensional vector of idiosyncratic error variances. As `\bm{\Sigma}_{\epsilon}` is assumed to be diagonal.

state

A list containing the estimated states and state covariances with the following elements:

`factors`		the `n \times r` matrix of factor estimates.

`errors`		the `n \times p` matrix of AR(1) idiosyncratic error estimates. For err = AR1 only.

`factors.cov`		the `r \times r \times n` covariance matrices of the factor estimates.

`errors.cov`		the `p \times p \times n` covariance matrices of the AR(1) idiosyncratic error estimates. For err = AR1 only.

em

A list containing information about the EM algorithm with the following elements:

`converged`		a logical value indicating whether the EM algorithm converged.

`alpha_grid`		a numerical vector containing the LASSO tuning parameters considered in BIC evaluation before stopping.

`alpha_opt`		the optimal LASSO tuning parameter used.

`bic`		a numerical vector containing BIC values for the corresponding LASSO tuning parameter in `alpha_grid`.

`loglik`		the log-likelihood of the innovations from the Kalman filter in the final model.

`num_iter`		number of iterations taken by the EM algorithm.

`tol`		tolerance for EM convergence. Matches `threshold` in the input.

`max_iter`		maximum number of iterations allowed for the EM algorithm. Matches `max_iter` in the input.

`em_time`		time taken for EM convergence

alpha.output

Parameter and state outputs for each L1-norm penalty parameter in alphas if store.parameters = TRUE.

References

Banbura, M., & Modugno, M. (2014). Maximum likelihood estimation of factor models on datasets with arbitrary pattern of missing data. Journal of Applied Econometrics, 29(1), 133-160.

Doz, C., Giannone, D., & Reichlin, L. (2011). A two-step estimator for large approximate dynamic factor models based on Kalman filtering. Journal of Econometrics, 164(1), 188-205.

Giannone, D., Reichlin, L., & Small, D. (2008). Nowcasting: The real-time informational content of macroeconomic data. Journal of monetary economics, 55(4), 665-676.

Koopman, S. J., & Durbin, J. (2000). Fast filtering and smoothing for multivariate state space models. Journal of Time Series Analysis, 21(3), 281-296.

Mosley, L., Chan, TS., & Gibberd, A. (2023). sparseDFM: An R Package to Estimate Dynamic Factor Models with Sparse Loadings.

Shumway, R. H., & Stoffer, D. S. (1982). An approach to time series smoothing and forecasting using the EM algorithm. Journal of time series analysis, 3(4), 253-264.

Stock, J. H., & Watson, M. W. (2002). Forecasting using principal components from a large number of predictors. Journal of the American statistical association, 97(460), 1167-1179.

Examples

# load inflation data set 
data = inflation

# reduce the size for these examples - full data found in vignette 
data = data[1:60,]

# make stationary by taking first differences 
new_data = transformData(data, rep(2,ncol(data)))

# tune for the number of factors to use 
tuneFactors(new_data, type = 2)

# fit a PCA using 3 PC's
fit.pca <- sparseDFM(new_data, r = 3, alg = 'PCA')

# fit a DFM using the two-stage approach 
fit.2stage <- sparseDFM(new_data, r = 3, alg = '2Stage')

# fit a DFM using EM algorithm with 3 factors 
fit.dfm <- sparseDFM(new_data, r = 3, alg = 'EM')

# fit a Sparse DFM with 3 factors
fit.sdfm <- sparseDFM(new_data, r = 3, alg = 'EM-sparse')

# observe the factor loadings of the sparse DFM
plot(fit.sdfm, type = 'loading.heatmap')

# observe the factors 
plot(fit.sdfm, type = 'factor')

# observe the residuals 
plot(fit.sdfm, type = 'residual')

# observe the LASSO parameter selected and BIC values 
plot(fit.sdfm, type = 'lasso.bic')

# predict 3 steps ahead 
predict(fit.sdfm, h = 3)

[Package sparseDFM version 1.0 Index]