DFM {dfms} | R Documentation |
Estimate a Dynamic Factor Model
Description
Efficient estimation of a Dynamic Factor Model via the EM Algorithm - on stationary data with time-invariant system matrices and classical assumptions, while permitting missing data.
Usage
DFM(
X,
r,
p = 1L,
...,
idio.ar1 = FALSE,
rQ = c("none", "diagonal", "identity"),
rR = c("diagonal", "identity", "none"),
em.method = c("auto", "DGR", "BM", "none"),
min.iter = 25L,
max.iter = 100L,
tol = 1e-04,
pos.corr = TRUE,
check.increased = FALSE
)
Arguments
X |
a | |||||||||||||||||
r |
integer. number of factors. | |||||||||||||||||
p |
integer. number of lags in factor VAR. | |||||||||||||||||
... |
(optional) arguments to | |||||||||||||||||
idio.ar1 |
logical. Model observation errors as AR(1) processes: | |||||||||||||||||
rQ |
character. restrictions on the state (transition) covariance matrix (Q). | |||||||||||||||||
rR |
character. restrictions on the observation (measurement) covariance matrix (R). | |||||||||||||||||
em.method |
character. The implementation of the Expectation Maximization Algorithm used. The options are:
| |||||||||||||||||
min.iter |
integer. Minimum number of EM iterations (to ensure a convergence path). | |||||||||||||||||
max.iter |
integer. Maximum number of EM iterations. | |||||||||||||||||
tol |
numeric. EM convergence tolerance. | |||||||||||||||||
pos.corr |
logical. Increase the likelihood that factors correlate positively with the data, by scaling the eigenvectors such that the principal components (used to initialize the Kalman Filter) co-vary positively with the row-means of the standardized data. | |||||||||||||||||
check.increased |
logical. Check if likelihood has increased. Passed to |
Details
This function efficiently estimates a Dynamic Factor Model with the following classical assumptions:
Linearity
Idiosynchratic measurement (observation) errors (R is diagonal)
No direct relationship between series and lagged factors (ceteris paribus contemporaneous factors)
No relationship between lagged error terms in the either measurement or transition equation (no serial correlation), unless explicitly modeled as AR(1) processes using
idio.ar1 = TRUE
.
Factors are allowed to evolve in a VAR(p)
process, and data is internally standardized (scaled and centered) before estimation (removing the need of intercept terms).
By assumptions 1-4, this translates into the following dynamic form:
\textbf{x}_t = \textbf{C}_0 \textbf{f}_t + \textbf{e}_t \ \sim\ N(\textbf{0}, \textbf{R})
\textbf{f}_t = \sum_{j=1}^p \textbf{A}_j \textbf{f}_{t-j} + \textbf{u}_t \ \sim\ N(\textbf{0}, \textbf{Q}_0)
where the first equation is called the measurement or observation equation and the second equation is called transition, state or process equation, and
n | number of series in \textbf{x}_t (r and p as the arguments to DFM ). |
|
\textbf{x}_t | n \times 1 vector of observed series at time t : (x_{1t}, \dots, x_{nt})' . Some observations can be missing. |
|
\textbf{f}_t | r \times 1 vector of factors at time t : (f_{1t}, \dots, f_{rt})' . |
|
\textbf{C}_0 | n \times r measurement (observation) matrix. |
|
\textbf{A}_j | r \times r state transition matrix at lag j . |
|
\textbf{Q}_0 | r \times r state covariance matrix. |
|
\textbf{R} | n \times n measurement (observation) covariance matrix. It is diagonal by assumption 2 that E[\textbf{x}_{it}|\textbf{x}_{-i,t},\textbf{x}_{i,t-1}, \dots, \textbf{f}_t, \textbf{f}_{t-1}, \dots] = \textbf{Cf}_t \forall i . |
|
This model can be estimated using a classical form of the Kalman Filter and the Expectation Maximization (EM) algorithm, after transforming it to State-Space (stacked, VAR(1)) form:
\textbf{x}_t = \textbf{C} \textbf{F}_t + \textbf{e}_t \ \sim\ N(\textbf{0}, \textbf{R})
\textbf{F}_t = \textbf{A F}_{t-1} + \textbf{u}_t \ \sim\ N(\textbf{0}, \textbf{Q})
where
n | number of series in \textbf{x}_t (r and p as the arguments to DFM ). |
|
\textbf{x}_t | n \times 1 vector of observed series at time t : (x_{1t}, \dots, x_{nt})' . Some observations can be missing. |
|
\textbf{F}_t | rp \times 1 vector of stacked factors at time t : (f_{1t}, \dots, f_{rt}, f_{1,t-1}, \dots, f_{r,t-1}, \dots, f_{1,t-p}, \dots, f_{r,t-p})' . |
|
\textbf{C} | n \times rp observation matrix. Only the first n \times r terms are non-zero, by assumption 3 that E[\textbf{x}_t|\textbf{F}_t] = E[\textbf{x}_t|\textbf{f}_t] (no relationship of observed series with lagged factors given contemporaneous factors). |
|
\textbf{A} | stacked rp \times rp state transition matrix consisting of 3 parts: the top r \times rp part provides the dynamic relationships captured by (\textbf{A}_1, \dots, \textbf{A}_p) in the dynamic form, the terms A[(r+1):rp, 1:(rp-r)] constitute an (rp-r) \times (rp-r) identity matrix mapping all lagged factors to their known values at times t. The remaining part A[(rp-r+1):rp, (rp-r+1):rp] is an r \times r matrix of zeros. |
|
\textbf{Q} | rp \times rp state covariance matrix. The top r \times r part gives the contemporaneous relationships, the rest are zeros by assumption 4. |
|
\textbf{R} | n \times n observation covariance matrix. It is diagonal by assumption 2 and identical to \textbf{R} as stated in the dynamic form. |
|
Value
A list-like object of class 'dfm' with the following elements:
X_imp |
| |||||||||||||||||
eigen |
| |||||||||||||||||
F_pca |
| |||||||||||||||||
P_0 |
| |||||||||||||||||
F_2s |
| |||||||||||||||||
P_2s |
| |||||||||||||||||
F_qml |
| |||||||||||||||||
P_qml |
| |||||||||||||||||
A |
| |||||||||||||||||
C |
| |||||||||||||||||
Q |
| |||||||||||||||||
R |
| |||||||||||||||||
e |
| |||||||||||||||||
rho |
| |||||||||||||||||
loglik |
vector of log-likelihoods - one for each EM iteration. The final value corresponds to the log-likelihood of the reported model. | |||||||||||||||||
tol |
The numeric convergence tolerance used. | |||||||||||||||||
converged |
single logical valued indicating whether the EM algorithm converged (within | |||||||||||||||||
anyNA |
single logical valued indicating whether there were any (internal) missing values in the data (determined after removal of rows with too many missing values). If | |||||||||||||||||
rm.rows |
vector of any cases (rows) that were removed beforehand (subject to | |||||||||||||||||
em.method |
The EM method used. | |||||||||||||||||
call |
call object obtained from |
References
Doz, C., Giannone, D., & Reichlin, L. (2011). A two-step estimator for large approximate dynamic factor models based on Kalman filtering. Journal of Econometrics, 164(1), 188-205.
Doz, C., Giannone, D., & Reichlin, L. (2012). A quasi-maximum likelihood approach for large, approximate dynamic factor models. Review of Economics and Statistics, 94(4), 1014-1024.
Banbura, M., & Modugno, M. (2014). Maximum likelihood estimation of factor models on datasets with arbitrary pattern of missing data. Journal of Applied Econometrics, 29(1), 133-160.
Stock, J. H., & Watson, M. W. (2016). Dynamic Factor Models, Factor-Augmented Vector Autoregressions, and Structural Vector Autoregressions in Macroeconomics. Handbook of Macroeconomics, 2, 415–525. https://doi.org/10.1016/bs.hesmac.2016.04.002
Examples
library(magrittr)
library(xts)
library(vars)
# BM14 Replication Data. Constructing the database:
BM14 = merge(BM14_M, BM14_Q)
BM14[, BM14_Models$log_trans] %<>% log()
BM14[, BM14_Models$freq == "M"] %<>% diff()
BM14[, BM14_Models$freq == "Q"] %<>% diff(3)
### Small Model ---------------------------------------
# IC for number of factors
IC_small = ICr(BM14[, BM14_Models$small], max.r = 5)
plot(IC_small)
screeplot(IC_small)
# I take 2 factors. Now number of lags
VARselect(IC_small$F_pca[, 1:2])
# Estimating the model with 2 factors and 3 lags
dfm_small = DFM(BM14[, BM14_Models$small], 2, 3)
# Inspecting the model
summary(dfm_small)
plot(dfm_small) # Factors and data
plot(dfm_small, method = "all", type = "individual") # Factor estimates
plot(dfm_small, type = "residual") # Residuals from factor predictions
# 10 periods ahead forecast
plot(predict(dfm_small), xlim = c(300, 370))
### Medium-Sized Model ---------------------------------
# IC for number of factors
IC_medium = ICr(BM14[, BM14_Models$medium])
plot(IC_medium)
screeplot(IC_medium)
# I take 3 factors. Now number of lags
VARselect(IC_medium$F_pca[, 1:3])
# Estimating the model with 3 factors and 3 lags
dfm_medium = DFM(BM14[, BM14_Models$medium], 3, 3)
# Inspecting the model
summary(dfm_medium)
plot(dfm_medium) # Factors and data
plot(dfm_medium, method = "all", type = "individual") # Factor estimates
plot(dfm_medium, type = "residual") # Residuals from factor predictions
# 10 periods ahead forecast
plot(predict(dfm_medium), xlim = c(300, 370))
### Large Model ---------------------------------
# IC for number of factors
IC_large = ICr(BM14)
plot(IC_large)
screeplot(IC_large)
# I take 6 factors. Now number of lags
VARselect(IC_large$F_pca[, 1:6])
# Estimating the model with 6 factors and 3 lags
dfm_large = DFM(BM14, 6, 3)
# Inspecting the model
summary(dfm_large)
plot(dfm_large) # Factors and data
# plot(dfm_large, method = "all", type = "individual") # Factor estimates
plot(dfm_large, type = "residual") # Residuals from factor predictions
# 10 periods ahead forecast
plot(predict(dfm_large), xlim = c(300, 370))