sparseDFM {sparseDFM}R Documentation

Estimate a Sparse Dynamic Factor Model

Description

Main function to allow estimation of a DFM or a sparse DFM (with sparse loadings) on stationary data that may have arbitrary patterns of missing data. We allow the user:

Usage

sparseDFM(
  X,
  r,
  q = 0,
  alphas = logspace(-2, 3, 100),
  alg = "EM-sparse",
  err = "IID",
  kalman = "univariate",
  store.parameters = FALSE,
  standardize = TRUE,
  max_iter = 100,
  threshold = 1e-04
)

Arguments

X

n x p numeric data matrix or data frame of (stationary) time series.

r

Integer. Number of factors.

q

Integer. The first q series (columns of X) should not be made sparse. Default q = 0.

alphas

Numeric vector or value of LASSO regularisation parameters. Default is alphas = logspace(-2,3,100).

alg

Character. Option for estimation algorithm. Default is "EM-sparse". Options are:

"PCA" principle components analysis (PCA) for static factors seen in Stock and Watson (2002).
"2Stage" the two-stage framework of PCA plus Kalman filter/smoother seen in Giannone et al. (2008) and Doz et al. (2011).
"EM" the quasi-maximum likelihood approach using the EM algorithm to handle arbitrary patterns of missing data seen in Banbura and Modugno (2014).
"EM-sparse" the novel sparse EM approach allowing LASSO regularisation on factor loadings seen in (cite our paper).
err

Character. Option for idiosyncratic errors. Default is "IID". Options are:

"IID" errors are IID white noise.
"AR1" errors follow an AR(1) process.
kalman

Character. Option for Kalman filter and smoother equations. Default is "univariate". Options are:

"multivariate" classic Kalman filter and smoother equations seen in Shumway and Stoffer (1982).
"univaraite" univariate treatment (sequential processing) of the multivariate equations for fast Kalman filter and smoother seen in Koopman and Durbin (2000).
store.parameters

Logical. Store outputs for every alpha L1 penalty parameter. Default is FALSE.

standardize

Logical. Standardize the data before estimating the model. Default is TRUE.

max_iter

Integer. Maximum number of EM iterations. Default is 100.

threshold

Numeric. Tolerance on EM iterates. Default is 1e-4.

Details

For full details of the model please refer to Mosley et al. (2023).

Value

A list-of-lists-like S3 object of class 'sparseDFM' with the following elements:

data

A list containing information about the data with the following elements:

X is the original n \times p numeric data matrix of (stationary) time series.
standardize is a logical value indicating whether the original data was standardized.
X.mean is a p-dimensional numeric vector of column means of X.
X.sd is a p-dimensional numeric vector of column standard deviations of X.
X.bal is a n \times p numeric data matrix of the original X with missing data interpolated using fillNA().
eigen is the eigen decomposition of X.bal.
fitted is the n \times p predicted data matrix using the estimated parameters: \hat{\Lambda}\hat{F}.
fitted.unscaled is the n \times p predicted data matrix using the estimated parameters: \hat{\Lambda}\hat{F} that has been unscaled back to original data scale if standardize is TRUE.
method the estimation algorithm used (alg).
err the type of idiosyncratic errors assumed. Either IID or AR1.
call call object obtained from match.call().
params

A list containing the estimated parameters of the model with the following elements:

A the r \times r factor transition matrix.
Phi the p-dimensional vector of AR(1) coefficients for the idiosyncratic errors.
Lambda the p \times r loadings matrix.
Sigma_u the r \times r factor transition error covariance matrix.
Sigma_epsilon the p-dimensional vector of idiosyncratic error variances. As \bm{\Sigma}_{\epsilon} is assumed to be diagonal.
state

A list containing the estimated states and state covariances with the following elements:

factors the n \times r matrix of factor estimates.
errors the n \times p matrix of AR(1) idiosyncratic error estimates. For err = AR1 only.
factors.cov the r \times r \times n covariance matrices of the factor estimates.
errors.cov the p \times p \times n covariance matrices of the AR(1) idiosyncratic error estimates. For err = AR1 only.
em

A list containing information about the EM algorithm with the following elements:

converged a logical value indicating whether the EM algorithm converged.
alpha_grid a numerical vector containing the LASSO tuning parameters considered in BIC evaluation before stopping.
alpha_opt the optimal LASSO tuning parameter used.
bic a numerical vector containing BIC values for the corresponding LASSO tuning parameter in alpha_grid.
loglik the log-likelihood of the innovations from the Kalman filter in the final model.
num_iter number of iterations taken by the EM algorithm.
tol tolerance for EM convergence. Matches threshold in the input.
max_iter maximum number of iterations allowed for the EM algorithm. Matches max_iter in the input.
em_time time taken for EM convergence
alpha.output

Parameter and state outputs for each L1-norm penalty parameter in alphas if store.parameters = TRUE.

References

Banbura, M., & Modugno, M. (2014). Maximum likelihood estimation of factor models on datasets with arbitrary pattern of missing data. Journal of Applied Econometrics, 29(1), 133-160.

Doz, C., Giannone, D., & Reichlin, L. (2011). A two-step estimator for large approximate dynamic factor models based on Kalman filtering. Journal of Econometrics, 164(1), 188-205.

Giannone, D., Reichlin, L., & Small, D. (2008). Nowcasting: The real-time informational content of macroeconomic data. Journal of monetary economics, 55(4), 665-676.

Koopman, S. J., & Durbin, J. (2000). Fast filtering and smoothing for multivariate state space models. Journal of Time Series Analysis, 21(3), 281-296.

Mosley, L., Chan, TS., & Gibberd, A. (2023). sparseDFM: An R Package to Estimate Dynamic Factor Models with Sparse Loadings.

Shumway, R. H., & Stoffer, D. S. (1982). An approach to time series smoothing and forecasting using the EM algorithm. Journal of time series analysis, 3(4), 253-264.

Stock, J. H., & Watson, M. W. (2002). Forecasting using principal components from a large number of predictors. Journal of the American statistical association, 97(460), 1167-1179.

Examples

# load inflation data set 
data = inflation

# reduce the size for these examples - full data found in vignette 
data = data[1:60,]

# make stationary by taking first differences 
new_data = transformData(data, rep(2,ncol(data)))

# tune for the number of factors to use 
tuneFactors(new_data, type = 2)

# fit a PCA using 3 PC's
fit.pca <- sparseDFM(new_data, r = 3, alg = 'PCA')

# fit a DFM using the two-stage approach 
fit.2stage <- sparseDFM(new_data, r = 3, alg = '2Stage')

# fit a DFM using EM algorithm with 3 factors 
fit.dfm <- sparseDFM(new_data, r = 3, alg = 'EM')

# fit a Sparse DFM with 3 factors
fit.sdfm <- sparseDFM(new_data, r = 3, alg = 'EM-sparse')

# observe the factor loadings of the sparse DFM
plot(fit.sdfm, type = 'loading.heatmap')

# observe the factors 
plot(fit.sdfm, type = 'factor')

# observe the residuals 
plot(fit.sdfm, type = 'residual')

# observe the LASSO parameter selected and BIC values 
plot(fit.sdfm, type = 'lasso.bic')

# predict 3 steps ahead 
predict(fit.sdfm, h = 3)

 

[Package sparseDFM version 1.0 Index]