DFM {dfms}R Documentation

Estimate a Dynamic Factor Model

Description

Efficient estimation of a Dynamic Factor Model via the EM Algorithm - on stationary data with time-invariant system matrices and classical assumptions, while permitting missing data.

Usage

DFM(
  X,
  r,
  p = 1L,
  ...,
  idio.ar1 = FALSE,
  rQ = c("none", "diagonal", "identity"),
  rR = c("diagonal", "identity", "none"),
  em.method = c("auto", "DGR", "BM", "none"),
  min.iter = 25L,
  max.iter = 100L,
  tol = 1e-04,
  pos.corr = TRUE,
  check.increased = FALSE
)

Arguments

X

a T x n numeric data matrix or frame of stationary time series. May contain missing values.

r

integer. number of factors.

p

integer. number of lags in factor VAR.

...

(optional) arguments to tsnarmimp.

idio.ar1

logical. Model observation errors as AR(1) processes: e_t = \rho e_{t-1} + v_t. Note that this substantially increases computation time, and is generaly not needed if n is large (>30). See theoretical vignette for details.

rQ

character. restrictions on the state (transition) covariance matrix (Q).

rR

character. restrictions on the observation (measurement) covariance matrix (R).

em.method

character. The implementation of the Expectation Maximization Algorithm used. The options are:

"auto" Automatic selection: "BM" if anyNA(X), else "DGR".
"DGR" The classical EM implementation of Doz, Giannone and Reichlin (2012). This implementation is efficient and quite robust, missing values are removed on a casewise basis in the Kalman Filter and Smoother, but not explicitly accounted for in EM iterations.
"BM" The modified EM algorithm of Banbura and Modugno (2014) which also accounts for missing data in the EM iterations. Optimal for datasets with systematically missing data e.g. datasets with ragged edges or series at different frequencies.
"none" Performs no EM iterations and just returns the Two-Step estimates from running the data through the Kalman Filter and Smoother once as in Doz, Giannone and Reichlin (2011) (the Kalman Filter is Initialized with system matrices obtained from a regression and VAR on PCA factor estimates). This yields significant performance gains over the iterative methods. Final system matrices are estimated by running a regression and a VAR on the smoothed factors.
min.iter

integer. Minimum number of EM iterations (to ensure a convergence path).

max.iter

integer. Maximum number of EM iterations.

tol

numeric. EM convergence tolerance.

pos.corr

logical. Increase the likelihood that factors correlate positively with the data, by scaling the eigenvectors such that the principal components (used to initialize the Kalman Filter) co-vary positively with the row-means of the standardized data.

check.increased

logical. Check if likelihood has increased. Passed to em_converged. If TRUE, the algorithm only terminates if convergence was reached with decreasing likelihood.

Details

This function efficiently estimates a Dynamic Factor Model with the following classical assumptions:

  1. Linearity

  2. Idiosynchratic measurement (observation) errors (R is diagonal)

  3. No direct relationship between series and lagged factors (ceteris paribus contemporaneous factors)

  4. No relationship between lagged error terms in the either measurement or transition equation (no serial correlation), unless explicitly modeled as AR(1) processes using idio.ar1 = TRUE.

Factors are allowed to evolve in a VAR(p) process, and data is internally standardized (scaled and centered) before estimation (removing the need of intercept terms). By assumptions 1-4, this translates into the following dynamic form:

\textbf{x}_t = \textbf{C}_0 \textbf{f}_t + \textbf{e}_t \ \sim\ N(\textbf{0}, \textbf{R})

\textbf{f}_t = \sum_{j=1}^p \textbf{A}_j \textbf{f}_{t-j} + \textbf{u}_t \ \sim\ N(\textbf{0}, \textbf{Q}_0)

where the first equation is called the measurement or observation equation and the second equation is called transition, state or process equation, and

n number of series in \textbf{x}_t (r and p as the arguments to DFM).
\textbf{x}_t n \times 1 vector of observed series at time t: (x_{1t}, \dots, x_{nt})'. Some observations can be missing.
\textbf{f}_t r \times 1 vector of factors at time t: (f_{1t}, \dots, f_{rt})'.
\textbf{C}_0 n \times r measurement (observation) matrix.
\textbf{A}_j r \times r state transition matrix at lag j.
\textbf{Q}_0 r \times r state covariance matrix.
\textbf{R} n \times n measurement (observation) covariance matrix. It is diagonal by assumption 2 that E[\textbf{x}_{it}|\textbf{x}_{-i,t},\textbf{x}_{i,t-1}, \dots, \textbf{f}_t, \textbf{f}_{t-1}, \dots] = \textbf{Cf}_t \forall i.

This model can be estimated using a classical form of the Kalman Filter and the Expectation Maximization (EM) algorithm, after transforming it to State-Space (stacked, VAR(1)) form:

\textbf{x}_t = \textbf{C} \textbf{F}_t + \textbf{e}_t \ \sim\ N(\textbf{0}, \textbf{R})

\textbf{F}_t = \textbf{A F}_{t-1} + \textbf{u}_t \ \sim\ N(\textbf{0}, \textbf{Q})

where

n number of series in \textbf{x}_t (r and p as the arguments to DFM).
\textbf{x}_t n \times 1 vector of observed series at time t: (x_{1t}, \dots, x_{nt})'. Some observations can be missing.
\textbf{F}_t rp \times 1 vector of stacked factors at time t: (f_{1t}, \dots, f_{rt}, f_{1,t-1}, \dots, f_{r,t-1}, \dots, f_{1,t-p}, \dots, f_{r,t-p})'.
\textbf{C} n \times rp observation matrix. Only the first n \times r terms are non-zero, by assumption 3 that E[\textbf{x}_t|\textbf{F}_t] = E[\textbf{x}_t|\textbf{f}_t] (no relationship of observed series with lagged factors given contemporaneous factors).
\textbf{A} stacked rp \times rp state transition matrix consisting of 3 parts: the top r \times rp part provides the dynamic relationships captured by (\textbf{A}_1, \dots, \textbf{A}_p) in the dynamic form, the terms A[(r+1):rp, 1:(rp-r)] constitute an (rp-r) \times (rp-r) identity matrix mapping all lagged factors to their known values at times t. The remaining part A[(rp-r+1):rp, (rp-r+1):rp] is an r \times r matrix of zeros.
\textbf{Q} rp \times rp state covariance matrix. The top r \times r part gives the contemporaneous relationships, the rest are zeros by assumption 4.
\textbf{R} n \times n observation covariance matrix. It is diagonal by assumption 2 and identical to \textbf{R} as stated in the dynamic form.

Value

A list-like object of class 'dfm' with the following elements:

X_imp

T \times n matrix with the imputed and standardized (scaled and centered) data - with attributes attached allowing reconstruction of the original data:

"stats" is a n \times 5 matrix of summary statistics of class "qsu" (see qsu).
"missing" is a T \times n logical matrix indicating missing or infinite values in the original data (which are imputed in X_imp).
"attributes" contains the attributes of the original data input.
"is.list" is a logical value indicating whether the original data input was a list / data frame.
eigen

eigen(cov(X_imp)).

F_pca

T \times r matrix of principal component factor estimates - X_imp %*% eigen$vectors.

P_0

r \times r initial factor covariance matrix estimate based on PCA results.

F_2s

T \times r matrix two-step factor estimates as in Doz, Giannone and Reichlin (2011) - obtained from running the data through the Kalman Filter and Smoother once, where the Filter is initialized with results from PCA.

P_2s

r \times r \times T covariance matrices of two-step factor estimates.

F_qml

T \times r matrix of quasi-maximum likelihood factor estimates - obtained by iteratively Kalman Filtering and Smoothing the factor estimates until EM convergence.

P_qml

r \times r \times T covariance matrices of QML factor estimates.

A

r \times rp factor transition matrix.

C

n \times r observation matrix.

Q

r \times r state (error) covariance matrix.

R

n \times n observation (error) covariance matrix.

e

T \times n estimates of observation errors \textbf{e}_t. Only available if idio.ar1 = TRUE.

rho

n \times 1 estimates of AR(1) coefficients (\rho) in observation errors: e_t = \rho e_{t-1} + v_t. Only available if idio.ar1 = TRUE.

loglik

vector of log-likelihoods - one for each EM iteration. The final value corresponds to the log-likelihood of the reported model.

tol

The numeric convergence tolerance used.

converged

single logical valued indicating whether the EM algorithm converged (within max.iter iterations subject to tol).

anyNA

single logical valued indicating whether there were any (internal) missing values in the data (determined after removal of rows with too many missing values). If FALSE, X_imp is simply the original data in matrix form, and does not have the "missing" attribute attached.

rm.rows

vector of any cases (rows) that were removed beforehand (subject to max.missing and na.rm.method). If no cases were removed the slot is NULL.

em.method

The EM method used.

call

call object obtained from match.call().

References

Doz, C., Giannone, D., & Reichlin, L. (2011). A two-step estimator for large approximate dynamic factor models based on Kalman filtering. Journal of Econometrics, 164(1), 188-205.

Doz, C., Giannone, D., & Reichlin, L. (2012). A quasi-maximum likelihood approach for large, approximate dynamic factor models. Review of Economics and Statistics, 94(4), 1014-1024.

Banbura, M., & Modugno, M. (2014). Maximum likelihood estimation of factor models on datasets with arbitrary pattern of missing data. Journal of Applied Econometrics, 29(1), 133-160.

Stock, J. H., & Watson, M. W. (2016). Dynamic Factor Models, Factor-Augmented Vector Autoregressions, and Structural Vector Autoregressions in Macroeconomics. Handbook of Macroeconomics, 2, 415–525. https://doi.org/10.1016/bs.hesmac.2016.04.002

Examples

library(magrittr)
library(xts)
library(vars)

# BM14 Replication Data. Constructing the database:
BM14 = merge(BM14_M, BM14_Q)
BM14[, BM14_Models$log_trans] %<>% log()
BM14[, BM14_Models$freq == "M"] %<>% diff()
BM14[, BM14_Models$freq == "Q"] %<>% diff(3)


### Small Model ---------------------------------------

# IC for number of factors
IC_small = ICr(BM14[, BM14_Models$small], max.r = 5)
plot(IC_small)
screeplot(IC_small)

# I take 2 factors. Now number of lags
VARselect(IC_small$F_pca[, 1:2])

# Estimating the model with 2 factors and 3 lags
dfm_small = DFM(BM14[, BM14_Models$small], 2, 3)

# Inspecting the model
summary(dfm_small)
plot(dfm_small)  # Factors and data
plot(dfm_small, method = "all", type = "individual") # Factor estimates
plot(dfm_small, type = "residual") # Residuals from factor predictions

# 10 periods ahead forecast
plot(predict(dfm_small), xlim = c(300, 370))


### Medium-Sized Model ---------------------------------

# IC for number of factors
IC_medium = ICr(BM14[, BM14_Models$medium])
plot(IC_medium)
screeplot(IC_medium)

# I take 3 factors. Now number of lags
VARselect(IC_medium$F_pca[, 1:3])

# Estimating the model with 3 factors and 3 lags
dfm_medium = DFM(BM14[, BM14_Models$medium], 3, 3)

# Inspecting the model
summary(dfm_medium)
plot(dfm_medium)  # Factors and data
plot(dfm_medium, method = "all", type = "individual") # Factor estimates
plot(dfm_medium, type = "residual") # Residuals from factor predictions

# 10 periods ahead forecast
plot(predict(dfm_medium), xlim = c(300, 370))


### Large Model ---------------------------------

# IC for number of factors
IC_large = ICr(BM14)
plot(IC_large)
screeplot(IC_large)

# I take 6 factors. Now number of lags
VARselect(IC_large$F_pca[, 1:6])

# Estimating the model with 6 factors and 3 lags
dfm_large = DFM(BM14, 6, 3)

# Inspecting the model
summary(dfm_large)
plot(dfm_large)  # Factors and data
# plot(dfm_large, method = "all", type = "individual") # Factor estimates
plot(dfm_large, type = "residual") # Residuals from factor predictions

# 10 periods ahead forecast
plot(predict(dfm_large), xlim = c(300, 370))


[Package dfms version 0.2.1 Index]