fitGSMVAR {gmvarkit} | R Documentation |
Two-phase maximum likelihood estimation of a GMVAR, StMVAR, or G-StMVAR model
Description
fitGSMVAR
estimates a GMVAR, StMVAR, or G-StMVAR model model in two phases:
in the first phase it uses a genetic algorithm to find starting values for a gradient based
variable metric algorithm, which it then uses to finalize the estimation in the second phase.
Parallel computing is utilized to perform multiple rounds of estimations in parallel.
Usage
fitGSMVAR(
data,
p,
M,
model = c("GMVAR", "StMVAR", "G-StMVAR"),
conditional = TRUE,
parametrization = c("intercept", "mean"),
constraints = NULL,
same_means = NULL,
weight_constraints = NULL,
structural_pars = NULL,
ncalls = (M + 1)^5,
ncores = 2,
maxit = 1000,
seeds = NULL,
print_res = TRUE,
use_parallel = TRUE,
filter_estimates = TRUE,
calc_std_errors = TRUE,
...
)
Arguments
data |
a matrix or class |
p |
a positive integer specifying the autoregressive order of the model. |
M |
|
model |
is "GMVAR", "StMVAR", or "G-StMVAR" model considered? In the G-StMVAR model, the first |
conditional |
a logical argument specifying whether the conditional or exact log-likelihood function |
parametrization |
|
constraints |
a size |
same_means |
Restrict the mean parameters of some regimes to be the same? Provide a list of numeric vectors
such that each numeric vector contains the regimes that should share the common mean parameters. For instance, if
|
weight_constraints |
a numeric vector of length |
structural_pars |
If
See Virolainen (forthcoming) for the conditions required to identify the shocks and for the B-matrix as well (it is |
ncalls |
the number of estimation rounds that should be performed. |
ncores |
the number CPU cores to be used in parallel computing. |
maxit |
the maximum number of iterations in the variable metric algorithm. |
seeds |
a length |
print_res |
should summaries of estimation results be printed? |
use_parallel |
employ parallel computing? If |
filter_estimates |
should the likely inappropriate estimates be filtered? See details. |
calc_std_errors |
calculate approximate standard errors for the estimates? |
... |
additional settings passed to the function |
Details
If you wish to estimate a structural model without overidentifying constraints that is identified statistically,
specify your W matrix is structural_pars
to be such that it contains the same sign constraints in a single row
(e.g. a row of ones) and leave the other elements as NA
. In this way, the genetic algorithm works the best.
The ordering and signs of the columns of the W matrix can be changed afterwards with the functions
reorder_W_columns
and swap_W_signs
.
Because of complexity and high multimodality of the log-likelihood function, it's not certain that the estimation algorithms will end up in the global maximum point. It's expected that most of the estimation rounds will end up in some local maximum or saddle point instead. Therefore, a (sometimes large) number of estimation rounds is required for reliable results. Because of the nature of the model, the estimation may fail especially in the cases where the number of mixture components is chosen too large. With two regimes and couple hundred observations in a two-dimensional time series, 50 rounds is usually enough. Several hundred estimation rounds often suffices for reliably fitting two-regimes models to 3 or 4 dimensional time series. With more than two regimes and more than couple hundred observations, thousands of estimation rounds (or more) are often required to obtain reliable results.
The estimation process is computationally heavy and it might take considerably long time for large models with
large number of observations. If the iteration limit maxit
in the variable metric algorithm is reached,
one can continue the estimation by iterating more with the function iterate_more
. Alternatively, one may
use the found estimates as starting values for the genetic algorithm and and employ another round of estimation
(see ?GAfit
how to set up an initial population with the dot parameters).
If the estimation algorithm fails to create an initial population for the genetic algorithm, it usually helps to scale the individual series so that the AR coefficients (of a VAR model) will be relative small, preferably less than one. Even if one is able to create an initial population, it should be preferred to scale the series so that most of the AR coefficients will not be very large, as the estimation algorithm works better with relatively small AR coefficients. If needed, another package can be used to fit linear VARs to the series to see which scaling of the series results in relatively small AR coefficients.
The code of the genetic algorithm is mostly based on the description by Dorsey and Mayer (1995) but it includes some extra features that were found useful for this particular estimation problem. For instance, the genetic algorithm uses a slightly modified version of the individually adaptive crossover and mutation rates described by Patnaik and Srinivas (1994) and employs (50%) fitness inheritance discussed by Smith, Dike and Stegmann (1995).
The gradient based variable metric algorithm used in the second phase is implemented with function optim
from the package stats
.
Note that the structural models are even more difficult to estimate than the reduced form models due to
the different parametrization of the covariance matrices, so larger number of estimation rounds should be considered.
Also, be aware that if the lambda parameters are constrained in any other way than by restricting some of them to be
identical, the parameter "lambda_scale" of the genetic algorithm (see ?GAfit
) needs to be carefully adjusted accordingly.
When estimating a structural model that imposes overidentifiying constraints to a time series with d>3
,
it is highly recommended to create an initial population based on the estimates of a statistically identified model
(when M=2
). This is because currently obtaining the ML estimate reliably to such a structural model seems
difficult in many application.
Finally, the function fails to calculate approximate standard errors and the parameter estimates are near the border
of the parameter space, it might help to use smaller numerical tolerance for the stationarity and positive
definiteness conditions. The numerical tolerance of an existing model can be changed with the function
update_numtols
.
Filtering inappropriate estimates: If filter_estimates == TRUE
, the function will automatically filter
out estimates that it deems "inappropriate". That is, estimates that are not likely solutions of interest.
Specifically, solutions that incorporate a near-singular error term covariance matrix (any eigenvalue less than 0.002
),
mixing weights such that they are close to zero for almost all t
for at least one regime, or mixing weight parameter
estimate close to zero (or one). It also filters out estimates with any modulus "bold A" eigenvalues larger than 0.9985,
as the solution is near the boundary of the stationarity region and likely not a local maximum. You can also set
filter_estimates=FALSE
and find the solutions of interest yourself by using the
function alt_gsmvar
.
Value
Returns an object of class 'gsmvar'
defining the estimated (reduced form or structural) GMVAR, StMVAR, or G-StMVAR model.
Multivariate quantile residuals (Kalliovirta and Saikkonen 2010) are also computed and included in the returned object.
In addition, the returned object contains the estimates and log-likelihood values from all the estimation rounds performed.
The estimated parameter vector can be obtained at gsmvar$params
(and corresponding approximate standard errors
at gsmvar$std_errors
). See ?GSMVAR
for the form of the parameter vector, if needed.
Remark that the first autocovariance/correlation matrix in $uncond_moments
is for the lag zero,
the second one for the lag one, etc.
S3 methods
The following S3 methods are supported for class 'gsmvar'
: logLik
, residuals
, print
, summary
,
predict
, simulate
, and plot
.
References
Dorsey R. E. and Mayer W. J. 1995. Genetic algorithms for estimation problems with multiple optima, nondifferentiability, and other irregular features. Journal of Business & Economic Statistics, 13, 53-66.
Kalliovirta L., Meitz M. and Saikkonen P. 2016. Gaussian mixture vector autoregression. Journal of Econometrics, 192, 485-498.
Patnaik L.M. and Srinivas M. 1994. Adaptive Probabilities of Crossover and Mutation in Genetic Algorithms. Transactions on Systems, Man and Cybernetics 24, 656-667.
Smith R.E., Dike B.A., Stegmann S.A. 1995. Fitness inheritance in genetic algorithms. Proceedings of the 1995 ACM Symposium on Applied Computing, 345-350.
Virolainen S. (forthcoming). A statistically identified structural vector autoregression with endogenously switching volatility regime. Journal of Business & Economic Statistics.
Virolainen S. 2022. Gaussian and Student's t mixture vector autoregressive model with application to the asymmetric effects of monetary policy shocks in the Euro area. Unpublished working paper, available as arXiv:2109.13648.
See Also
GSMVAR
, iterate_more
, stmvar_to_gstmvar
, predict.gsmvar
,
profile_logliks
, simulate.gsmvar
, quantile_residual_tests
, print_std_errors
,
swap_parametrization
, get_gradient
, GIRF
, GFEVD
, LR_test
,
Wald_test
, gsmvar_to_sgsmvar
, stmvar_to_gstmvar
, reorder_W_columns
,
swap_W_signs
, cond_moment_plot
, update_numtols
Examples
## These are long running examples that use parallel computing!
# Running all the below examples will take approximately 3-4 minutes.
# GMVAR(1,2) model: 10 estimation rounds with seeds set
# for reproducibility
fit12 <- fitGSMVAR(gdpdef, p=1, M=2, ncalls=10, seeds=1:10)
fit12
plot(fit12)
summary(fit12)
print_std_errors(fit12)
profile_logliks(fit12)
# The rest of the examples only use a single estimation round with a given
# seed that produces the MLE to reduce running time of the examples. When
# estimating models for empirical applications, a large number of estimation
# rounds (ncalls = a large number) should be performed to ensure reliability
# of the estimates (see the section details).
# StMVAR(2, 2) model
fit22t <- fitGSMVAR(gdpdef, p=2, M=2, model="StMVAR", ncalls=1, seeds=1)
fit22t # Overly large degrees of freedom estimate in the 2nd regime!
fit22gs <- stmvar_to_gstmvar(fit22t) # So switch it to GMVAR type!
fit22gs # This is the appropriate G-StMVAR model based on the above StMVAR model.
fit22gss <- gsmvar_to_sgsmvar(fit22gs) # Switch to structural model
fit22gss # This is the implied statistically identified structural model.
# Structural GMVAR(1,2) model identified with sign
# constraints.
W_122 <- matrix(c(1, 1, -1, 1), nrow=2)
fit12s <- fitGSMVAR(gdpdef, p=1, M=2, structural_pars=list(W=W_122),
ncalls=1, seeds=1)
fit12s
# A statistically identified structural model can also be obtained with
# gsmvar_to_sgsmvar(fit12)
# GMVAR(2,2) model with autoregressive parameters restricted
# to be the same for both regimes
C_mat <- rbind(diag(2*2^2), diag(2*2^2))
fit22c <- fitGSMVAR(gdpdef, p=2, M=2, constraints=C_mat, ncalls=1, seeds=1)
fit22c
# G-StMVAR(2, 1, 1) model with autoregressive parameters and unconditional means restricted
# to be the same for both regimes:
fit22gscm <- fitGSMVAR(gdpdef, p=2, M=c(1, 1), model="G-StMVAR", constraints=C_mat,
parametrization="mean", same_means=list(1:2), ncalls=1, seeds=1)
# GMVAR(2,2) model with autoregressive parameters restricted
# to be the same for both regimes and non-diagonal elements
# the coefficient matrices constrained to zero.
tmp <- matrix(c(1, rep(0, 10), 1, rep(0, 8), 1, rep(0, 10), 1),
nrow=2*2^2, byrow=FALSE)
C_mat2 <- rbind(tmp, tmp)
fit22c2 <- fitGSMVAR(gdpdef, p=2, M=2, constraints=C_mat2, ncalls=1,
seeds=1)
fit22c2