| fitGSMAR {uGMAR} | R Documentation | 
Estimate Gaussian or Student's t Mixture Autoregressive model
Description
fitGSMAR estimates GMAR, StMAR, or G-StMAR model in two phases. In the first phase, a genetic algorithm is employed
to find starting values for a gradient based method. In the second phase, the gradient based variable metric algorithm is utilized to
accurately converge to a local maximum or a saddle point near each starting value. Parallel computing is used to conduct multiple
rounds of estimations in parallel.
Usage
fitGSMAR(
  data,
  p,
  M,
  model = c("GMAR", "StMAR", "G-StMAR"),
  restricted = FALSE,
  constraints = NULL,
  conditional = TRUE,
  parametrization = c("intercept", "mean"),
  ncalls = round(10 + 9 * log(sum(M))),
  ncores = 2,
  maxit = 500,
  seeds = NULL,
  print_res = TRUE,
  filter_estimates = TRUE,
  ...
)
Arguments
| data | a numeric vector or class  | 
| p | a positive integer specifying the autoregressive order of the model. | 
| M | 
 | 
| model | is "GMAR", "StMAR", or "G-StMAR" model considered? In the G-StMAR model, the first  | 
| restricted | a logical argument stating whether the AR coefficients  | 
| constraints | specifies linear constraints imposed to each regime's autoregressive parameters separately. 
 The symbol  | 
| conditional | a logical argument specifying whether the conditional or exact log-likelihood function should be used. | 
| parametrization | is the model parametrized with the "intercepts"  | 
| ncalls | a positive integer specifying how many rounds of estimation should be conducted. The estimation results may vary from round to round because of multimodality of the log-likelihood function and the randomness associated with the genetic algorithm. | 
| ncores | the number of CPU cores to be used in the estimation process. | 
| maxit | the maximum number of iterations for the variable metric algorithm. | 
| seeds | a length  | 
| print_res | should the estimation results be printed? | 
| filter_estimates | should likely inappropriate estimates be filtered? See details. | 
| ... | additional settings passed to the function  | 
Details
Because of complexity and multimodality of the log-likelihood function, it's not guaranteed that the estimation algorithm will end up in the global maximum point. It's often expected that most of the estimation rounds will end up in some local maximum point instead, and therefore a number of estimation rounds is required for reliable results. Because of the nature of the models, the estimation may fail particularly in the cases where the number of mixture components is chosen too large. Note that the genetic algorithm is designed to avoid solutions with mixing weights of some regimes too close to zero at almost all times ("redundant regimes") but the settings can, however, be adjusted (see ?GAfit).
If the iteration limit for the variable metric algorithm (maxit) is reached, one can continue the estimation by
iterating more with the function iterate_more.
The core of the genetic algorithm is mostly based on the description by Dorsey and Mayer (1995). It utilizes a slightly modified version the individually adaptive crossover and mutation rates described by Patnaik and Srinivas (1994) and employs (50%) fitness inheritance discussed by Smith, Dike and Stegmann (1995). Large (in absolute value) but stationary AR parameter values are generated with the algorithm proposed by Monahan (1984).
The variable metric algorithm (or quasi-Newton method, Nash (1990, algorithm 21)) used in the second phase is implemented
with function the optim from the package stats.
Additional Notes about the estimates:
Sometimes the found MLE is very close to the boundary of the stationarity region some regime, the related variance parameter
is very small, and the associated mixing weights are "spiky". This kind of estimates often maximize the log-likelihood function
for a technical reason that induces by the endogenously determined mixing weights. In such cases, it might be more appropriate
to consider the next-best local maximum point of the log-likelihood function that is well inside the parameter space. Models based
local-only maximum points can be built with the function alt_gsmar by adjusting the argument which_largest
accordingly.
Some mixture components of the StMAR model may sometimes get very large estimates for the degrees of freedom parameters.
Such parameters are weakly identified and induce various numerical problems. However, mixture components with large degree
of freedom parameter estimates are similar to the mixture components of the GMAR model. It's hence advisable to further
estimate a G-StMAR model by allowing the mixture components with large degrees of freedom parameter estimates to be GMAR
type with the function stmar_to_gstmar.
Filtering inappropriate estimates: If filter_estimates == TRUE, the function will automatically filter
out estimates that it deems "inappropriate". That is, estimates that are not likely solutions of interest.
Specifically, it filters out solutions that incorporate regimes with any modulus of the roots of the AR polynomial less
than 1.0015; a variance parameter estimat near zero  (less than 0.0015);
mixing weights such that they are close to zero for almost all t for at least one regime; or mixing weight parameter
estimate close to zero (or one). You can also set filter_estimates=FALSE and find the solutions of interest yourself
by using the function alt_gsmar.
Value
Returns an object of class 'gsmar' defining the estimated GMAR, StMAR or G-StMAR model. The returned object contains
estimated mixing weights, some conditional and unconditional moments, and quantile residuals. Note that the first p
observations are taken as the initial values, so the mixing weights, conditional moments, and quantile residuals start from
the p+1:th observation (interpreted as t=1). In addition, the returned object contains the estimates and log-likelihoods
from all of the estimation rounds. See ?GSMAR for the form of the parameter vector, if needed.
S3 methods
The following S3 methods are supported for class 'gsmar' objects: print, summary, plot,
predict, simulate, logLik, residuals.
References
- Dorsey R. E. and Mayer W. J. 1995. Genetic algorithms for estimation problems with multiple optima, nondifferentiability, and other irregular features. Journal of Business & Economic Statistics, 13, 53-66. 
- Kalliovirta L., Meitz M. and Saikkonen P. 2015. Gaussian Mixture Autoregressive model for univariate time series. Journal of Time Series Analysis, 36(2), 247-266. 
- Meitz M., Preve D., Saikkonen P. 2023. A mixture autoregressive model based on Student's t-distribution. Communications in Statistics - Theory and Methods, 52(2), 499-515. 
- Monahan J.F. 1984. A Note on Enforcing Stationarity in Autoregressive-Moving Average Models. Biometrica 71, 403-404. 
- Nash J. 1990. Compact Numerical Methods for Computers. Linear algebra and Function Minimization. Adam Hilger. 
- Patnaik L.M. and Srinivas M. 1994. Adaptive Probabilities of Crossover and Mutation in Genetic Algorithms. Transactions on Systems, Man and Cybernetics 24, 656-667. 
- Smith R.E., Dike B.A., Stegmann S.A. 1995. Fitness inheritance in genetic algorithms. Proceedings of the 1995 ACM Symposium on Applied Computing, 345-350. 
- Virolainen S. 2022. A mixture autoregressive model based on Gaussian and Student's t-distributions. Studies in Nonlinear Dynamics & Econometrics, 26(4) 559-580. 
See Also
GSMAR, iterate_more, , stmar_to_gstmar, add_data,
profile_logliks, swap_parametrization, get_gradient, simulate.gsmar,
predict.gsmar, diagnostic_plot, quantile_residual_tests, cond_moments,
uncond_moments, LR_test, Wald_test
Examples
## These are long running examples that use parallel computing.
## The below examples take approximately 90 seconds to run.
## Note that the number of estimation rounds (ncalls) is relatively small
## in the below examples to reduce the time required for running the examples.
## For reliable results, a large number of estimation rounds is recommended!
# GMAR model
fit12 <- fitGSMAR(data=simudata, p=1, M=2, model="GMAR", ncalls=4, seeds=1:4)
summary(fit12)
plot(fit12)
profile_logliks(fit12)
diagnostic_plot(fit12)
# StMAR model (large estimate of the degrees of freedom)
fit42t <- fitGSMAR(data=M10Y1Y, p=4, M=2, model="StMAR", ncalls=2, seeds=c(1, 6))
summary(fit42t) # Overly large 2nd regime degrees of freedom estimate!
fit42gs <- stmar_to_gstmar(fit42t) # Switch to G-StMAR model
summary(fit42gs) # An appropriate G-StMVAR model with one G and one t regime
plot(fit42gs)
# Restricted StMAR model
fit42r <- fitGSMAR(M10Y1Y, p=4, M=2, model="StMAR", restricted=TRUE,
                   ncalls=2, seeds=1:2)
fit42r
# G-StMAR model with one GMAR type and one StMAR type regime
fit42gs <- fitGSMAR(M10Y1Y, p=4, M=c(1, 1), model="G-StMAR",
                    ncalls=1, seeds=4)
fit42gs
# The following three examples demonstrate how to apply linear constraints
# to the autoregressive (AR) parameters.
# Two-regime GMAR p=2 model with the second AR coeffiecient of
# of the second regime contrained to zero.
C22 <- list(diag(1, ncol=2, nrow=2), as.matrix(c(1, 0)))
fit22c <- fitGSMAR(M10Y1Y, p=2, M=2, constraints=C22, ncalls=1, seeds=6)
fit22c
# StMAR(3, 1) model with the second order AR coefficient constrained to zero.
C31 <- list(matrix(c(1, 0, 0, 0, 0, 1), ncol=2))
fit31tc <- fitGSMAR(M10Y1Y, p=3, M=1, model="StMAR", constraints=C31,
                    ncalls=1, seeds=1)
fit31tc
# Such StMAR(3, 2) model that the AR coefficients are restricted to be
# the same for both regimes and the second AR coefficients are
# constrained to zero.
fit32rc <- fitGSMAR(M10Y1Y, p=3, M=2, model="StMAR", restricted=TRUE,
                    constraints=matrix(c(1, 0, 0, 0, 0, 1), ncol=2),
                    ncalls=1, seeds=1)
fit32rc