R: Fit (extended) generalised linear hidden Markov models.

eglhmm {eglhmm}

R Documentation

Fit (extended) generalised linear hidden Markov models.

Description

Fits an (extended) generalised linear model to a data set where the response in each “cell” of the model consists of a time series whose serial dependence is modelled by a hidden Markov model.

Usage

eglhmm(formula = NULL, response = NULL, data,
      distr = c("Gaussian", "Poisson", "Binomial", "Dbd", "Multinom", "discnp"),
      inclTau=TRUE,preSpecSigma=NULL, indep = NULL, size = NULL, nbot = NULL, ntop = NULL,
      cells = NULL, cf = "singlecell", K = NULL, par0 = NULL, randStart = NULL,
      method = c("lm", "em", "bf"), optimiser = c("optim", "nlm"),
      optimMethod = "BFGS", nlmWarn = FALSE, lmc = 10, tolerance = NULL,
      digits = NULL, verbose = FALSE, itmax = 200,
      contrast = c("treatment", "sum", "helmert"),
      crit = c("CLL", "L2", "Linf", "ABSGRD"), breaks = NULL, hessian = FALSE,
      useAnalGrad = FALSE, ca = FALSE, checkDecrLL=TRUE)

Arguments

`formula`	A model formula specifying the linear predictor for the model. The formula should not include `state` as a predictor variable. The variable `state` gets added to the formula automatically. Ignored if the model is bivariate, i.e. if the length of `response` is 2.
`response`	A character scalar or a length-2 vector of such scalars, specifying the name or names of the response(s). If `response` is not specified (i.e. if it is left as `NULL`) then `formula` (see below) must be specfied and `response` is taken to be the left hand side of `formula`. (In this case, it is of course univariate.)
`data`	A data frame with columns providing the response(s) and the predictor variables in the model.
`distr`	Character string specifying the distribution of the response(s) (“emissions”) variable(s). Currently (13/02/2024) the only distributions accommodated are Gaussian, Poisson, Binomial, Dbd, and Multinom. Note that `"discnp"` is just an alternative expression for `"Multinom"`. Ignored if the response is bivariate, in which case `distr` is forcibly set equal to `"Multinom"`. I.e. bivariate models are, currently, fitted only to data in which the emissions have the `"Multinom"` distribution.
`inclTau`	Logical scalar. Should the transition probability matrix parameters “`tau`” be included in those that are estimated via the Hessian/gradient pardigm? In this case, they are included in the set of parameters to which the gradient and Hessian are applicable. If not, they are estimated via the method of moments as is done when the EM algorithm is used. In this latter case the dimensions of the Hessian are reduced (by a substantial amount if `K` is “large”).
`preSpecSigma`	Numeric vector of length `K` (see below) with strictly positive entries. Ignored if `distr` is not equal to `"Gaussian"`. This vector provides “pre-specified” values of the standard deviations `sigma` of the Gaussian distribution associated with each state. If `preSpecSigma` is specified, then it is used as the value of `sigma` throughout the fitting process, and `sigma` is not estimated from the data. If `distr` is `"Gaussian"` and `preSpecSigma` is specified, then an error will be thrown if the length of `preSpecSigma` is not equal to `K`, or if any entries of `preSpecSigma` fail to be strictly positive.
`indep`	Logical scalar; should the components of a bivariate model be considered to be independent? Ignored unless the model is bivariate (i.e. unless `response` is of length 2. If the model is bivariate and `indep` is not specified, an error is thrown.
`size`	Scalar integer specifying the number of trials in the experiment generating binomial data (see the `size` argument of `dbinom()`). Ignored unless `distr` is equal to `"Binomial"`.
`nbot`	Scalar integer specifying the lower end (0 or 1) of the range of values of the discretised Beta distribution. Ignored unless `distr` is `"Dbd"`.
`ntop`	Scalar integer specifying the upper end of the range of values of the discretised Beta distribution. Ignored unless `distr` is `"Dbd"`.
`cells`	A character vector giving the names of the factors (columns of the `data` data frame) which determine what the “cells” of the model are considered to be. The cells correspond to the combinations of levels of the factors named by `cells`. The sequences of observations from each of the cells constitute a collection of independent time series, all following the specified model.
`cf`	A factor (“cell factor”) specifying the cells of the model. If `cells` is not specified, then `cf` must be. If `cells` is specified, then `cf` is ignored. the model. If `cells` is not specified, then in most (if not all?) circumstances, `cf` should be set equal `factor(rep(1,nrow(data))`. This the effect of making the entire observation sequence equal to a single time series, following the specified model.
`K`	Scalar integer specifying the number of states of the hidden Markov model in question. If `K` is not specified and `par0` (see below) is specified, and has a component `tpm`, then `K` is set equal to `nrow(tpm)`. In this case, if `par0` does not have a `tpm` component, an error is thrown. An error is also thrown in this setting if `K` is specified to a value different from `nrow(tpm)`.
`par0`	A list comprising starting values for the parameter estimates, to be used by the various methods. (See `method` below.) This list may have components `tpm` (an estimate of the transition probability matrix), `phi` (a vector of estimates of the coefficients in the linear predictor in the generalised linear model) and `Rho` (a matrix, a list of two matrices, or a three dimensional array) that specifyies the emission probabilities when `distr` is `"Multinomial"`. Note that `par0` may consist of an object of class `"eglhmm"` (see below), i.e. a model previously fitted (perhaps without achieving convergence), by `eglhmm()`. This provides a means whereby a fitting procedure, that failed to converge, may be continued from where it left off.
`randStart`	Either a logical scalar or a list of three logical scalars named `tpm`, `phi`, and `Rho`. If the former, it is converted internally into a list with entries named `tpm`, `phi` and `Rho`, all having the same value as the original argument. If `tpm` is TRUE then the (undocumented) function `inititialise()` chooses entries for the starting value of `tpm` at random; likewise for `phi` and `Rho`. If left `NULL`, this argument defaults to `list(tpm=FALSE,phi=FALSE,Rho=FALSE)`.
`method`	Character string specifying the method used to fit the model. This may be `"lm"` (Levenberg-Marquardt algorithm), `"em"` (EM algorithm) or `"bf"` (“brute force”). The latter calls upon `optim()` or `nlm()` to do the heavy lifting). If the response is bivariate, then `method` is forcibly (and silently) set equal to `"em"`.
`optimiser`	Character string specifying which of `optim()` or `nlm()` should be used when `method` is `"bf"`. Ignored unless `method` is `"bf"`.
`optimMethod`	Character string specifying the optimisation method to be used by `optim()`. See `optim()` for details. Ignored unless `method` is `"bf"` and `optimiser` is `"optim"`.
`nlmWarn`	The `nlm()` function sometimes produces, in the first few iterations, warnings to the effect “NA/Inf replaced by maximum positive value”. These warnings are almost surely irrelevant and are annoying. If `nlmWarn` is `FALSE` (the default) then these warnings are suppressed. This argument is provided to allow for the remote possibilty that the user might want to see these warnings.
`lmc`	Positive numeric scalar. The initial “Levenberg-Marquardt constant”. Ignored unless `method` is `"lm"`.
`tolerance`	Positive numeric scalar. The convergence tolerance to be used. What this value actually means depends upon `method`. If left as `NULL` it defaults to `1e-6` for the bivariate methods, to `sqrt(.Machine$double.eps` for the `"em"` and `"lm"` methods, and to the default value of `reltol` used by `optim()` when `method` is `"bf"` and `optimiser` is `"optim"`. It is ignored if `method` is `"bf"` and `optimiser` is `"nlm"`.
`digits`	Integer scalar. The number of digits to which “progress reports” are printed when `verbose` (see below) is `TRUE`. There is a “sensible” default which is calculated in terms of `tolerance`. This argument is ignored if `method` is `"bf"`.
`verbose`	Logical scalar; if `TRUE`, rudimentary “progress reports” are printed out at appropriate points during the iteration process. The nature of these “reports” varies with `method`.
`itmax`	Integer scalar. The maximum number of iterative steps to take. Has a somewhat different meaning when `method` is `"bf"`, in which case the meaning depends on `optimiser`. For methods `"em"` and `"lm"`, if convergence is not achieved by `itmax` steps, the function gives up, prints a message to this effect, and returns a value with a component `converged=FALSE`. This returned value may be used as a starting (the value of the argument `par0`) so that the iterations may be continued from where they left off. Unfortunately this facility is not available when `method` is `"bf"`.
`contrast`	Text string specifying the contrast (in respect of unordered factors) (see `contrasts()` and `options()`) that will be used when the design matrix is constructed from the model formula. May be abbreviated (e.g. to "t", "s" or "h").
`crit`	Text string specifying the stopping criterion to be used. Possible values are “CLL” (scaled change in log likelihood), “L2” (scaled square root of the sum of squares of the changes in the parameter estimates), “Linf” (scaled maximum of the absolute value of the changes in the parameter estimates), and “ABSGRD” (scaled maximum of the absolute values of the entries of the gradient vector). The latter only makes sense for the Levenberg-Marquardt algorithm. This argument is ignored if `method` is `"bf"`. It seems that the `"bf"` method effectively uses “CLL” when `optimiser` is `"optim"`. When `optimiser` is `"nlm"` it seems that a combination of (something like) “ABSGRD” and “CLL” is used.
`breaks`	A vector of `K+1` values used to construct a set of guesses at the states corresponding to each observation. These are in turn used to calculate an initial estimate of the transition probability matrix. There is a “sensible” default (produced by the undocumented function `breaker()`.
`hessian`	Logical scalar; should a Hessian matrix obtained by numerical differentiation be returned? Ignored unless `method` is `"bf"`.
`useAnalGrad`	Logical scalar; should “analytical” calculation of the gradient be conducted? This argument is ignored unless the method is `"bf"`.
`ca`	Logical scalar; “check analyticals”. Used only when the method is `"bf"` and `optimiser` is `"nlm"`, and is passed on to `nlm()`.
`checkDecrLL`	Logical scalar; “check for a decrease in the log likelihood”. Ignored unless the `method` is `"em"`. Should the software check for a decrease in the log likelihood after an EM step? See the Remarks for further discussion.

Value

An object of class "eglhmm", consisting of a list with components:

`call`	The call by which this object was created. Present so that `update()` can be applied to objects returned by `eglhmm()`.
`tpm`	The estimated transition probability matrix.
`ispd`	The estimated initial state probability distribution.
`phi`	Except for the `"Multinom"` distribution this is the vector of estimated coefficients of the linear predictor in the generalised linear model. For the `"Multinom"` distribution it consists of the entries of `Rho` (see below) with the final all-zero column remove. In this case `phi` is of course redundant.
`theta`	The vector of parameter estimates that the estimation procedure actually works with. It consists of the catenation of the non-redundant parameterization of the transition probability matrix and the vector `phi`. It is redundant in the case of the `"Multinom"` distribution.
`Rho`	A matrix, or a list of two matrices or a three dimensional array specifying the emissions probabilities for a multinomial distribution. Present only if `distr` is `"Multinom"`.
`log.like`	The value of the log likelihood of the model evaluated at the parameter estimates, i.e. the (approximately) maximal value of the log likelihood.
`gradient`	(Not present for the `"em"` method.) The gradient vector of the log likelihood at the final parameter estimates; it should be effectively the zero vector.
`numHess`	(Present only if `method` is `"bf"` and only if the argument `hessian` is `TRUE`.) A value of the Hessian matrix (see below), obtained by means of numerical differentiation.
`Hessian`	(Present only if `method` is `"lm"`.) The Hessian matrix, i.e. the matrix of second partial derivatives of the log likelihood, evaluated at the final parameter estimates. The inverse of the negative of this matrix constitutes an estimate of the covariance matrix of the parameter estimates.
`mu`	A data frame with `npred+1` columns where `npred` is the number of predictors in the model. The rows contain, in their first `npred` entries, all possible combinations of the predictor values. The last (`npred+1`) entry of each row is the fitted mean of the Gaussian distribution, as determined by that combination of predictors. Present only if `distr` is `"Gaussian"`.
`sigma`	Numeric vector of length `K` whose entries consist of the fitted standard deviations for the underlying Gaussian distribution, corresponding to each of the states. Present only if `distr` is `"Gaussian"` and `preSpecSigma` is not supplied.
`preSpecSigma`	Numeric vector equal to the `preSpecSigma` argument, with names `"sigma1"`, `"sigma2"`, ..., `"sigmaK"` added. Present only if `distr` is `"Gaussian"` and `preSpecSigma` is supplied.
`stopCritVal`	Numeric scalar equal to the value, assumed by the stopping criterion specified by the argument `crit`, at the termination of the algorithm. If the algorithm converged then `stopCritVal` will be less than `tolerance`. Not present if `method` is `"bf"`. If `converged` (see below) is `NA` then `stopCritVal` is `NA` also.
`anomaly`	Logical scalar. Did an “anomaly” occur in an application of the EM algorithm? (See Remarks.) ' Present only if `method` was equal to `"em"`. This entry of the returned value is provided mainly for use by the `bcov()` function. Note that `anomaly` is added to the returned object, irrespective of the value of `checkDecrLL`. When `checkDecrLL` is `TRUE`, `anomaly` is somewhat redundant, since it will be `TRUE` if aand only if `converged` is `NA`. However when `checkDecrLL` is `FALSE`, `anomaly` is informtive, since it is not possible to tell from other entries of the returned value when an anomaly has occurred.
`converged`	A logical scalar. For the `"lm"`, and `"em"` methods it is `TRUE` if convergence is achieved within `itmax` iterations and `FALSE` otherwise. For the `"em"` method, if `checkDecrLL` is `TRUE`, then `converged` may be `NA`. See Remarks for some discussion. For the `"bf"` method `converged` is `TRUE` if the `convergence` component of the object returned by `optim()` is equal to 0 or if the `code` component of the object returned by `nlm()` is less than or equal to 2, and is `FALSE` otherwise. When `nlm()` is used, the value of `converged` has an attribute `"code"` equal to the actual value of the `code` component.
`nstep`	The number of steps (iterations) actually used by the algorithm. For the `"lm"` and `"em"` methods this is the number of Levenberg-Marquardt steps, or EM steps, respectively, taken by the algorithm. For the `"bf"` method it is the `counts` component of the object returned by `optim()` when `optimiser` is `"optim"` and it is the `iterations` component of the object returned by `nlm()` when `optimiser` is `"nlm"`.
`mean`	A vector of the fitted mean values underlying each combination of observed predictors and state (i.e. corresponding to each entry of `y` in the data frame used to fit the model. See the description of `data` below. Present only if `distr` is `"Gaussian"`.
`sd`	A vector of the fitted values of the standard deviations underlying each combination of observed predictors and state, i.e. corresponding to each entry of `y` in the data frame used to fit the model. See the description of `data` below. Present only if `distr` is `"Gaussian"`.
`lambda`	A vector of estimated values of the Poisson parameter associated with each combination of observed predictors and state, i.e. corresponding to each entry of `y` in the data frame used to fit the model. See the description of `data` below. Present only if `distr` is `"Poisson"`.
`p`	A vector of estimated values of the “success” probabilities associated with each combination of observed predictors and state, i.e. corresponding to each entry of `y` in the data frame used to fit the model. See the description of `data` below. Present only if `distr` is `"Binomial"`.
`alpha`	A numeric vector of the fitted “alpha” parameters, of the discretised Beta distribution, corresponding to each observation. Present only if `distr` is `"Dbd"`.
`beta`	A numeric vector of the fitted “beta” parameters, of the discretised Beta distribution, corresponding to each observation. Present only if `distr` is `"Dbd"`.
`fy`	The values of the “emission probability (density)” function, calculated at each observed value, for each state (i.e. at each entry of `y` in `data`. See below.) These values are calculated using the (final) fitted parameters.
`message`	A (long) text string that is produced if the EM algorithm encounters the anomaly of a decrease in the log likelihood after an EM step. It warns the user that this has occurred and suggests consulting the help file for an explanation. Present only if `method=="em"`, the anomaly referred to has occurred, and `checkDecrLL` is `TRUE`.
`par0`	The starting values used in the estimation procedure. Either those provided by the argument `par0` or those created by the (undocumented) function `initialise`.
`cells`	A character vector indicating the names of the factors specifying the “cells” of the model. (Equal to the `cells` argument.)
`formula`	The formula for the model that was fitted; equal to the `formula` argument, augmented by `state`.
`distr`	Text string specifying the distribution of the response variable. Equal to the `distr` argument of this function.
`nbot`	Integer scalar. The lower endpoint of the range of values of the discretised beta distribution. Equal to the value of the `nbot` argument of this function. Present only if `distr` is `"Dbd"`.
`ntop`	Integer scalar. The upper endpoint of the range of values of the discretised beta distribution. Equal to the value of the `nbot` argument of this function. Present only if `distr` is `"Dbd"`.
`size`	Scalar integer equal to the number of trials in the “experiments” generating the data. Equal to the `size` argument of this function. Present only if `distr` is `"Binomial"`.
`tolerance`	The convergence tolerance used to fit the model. Equal to the `tolerance` argument.
`crit`	Character scalar specifying the stopping criterion that was used. Equal to the `crit` argument of this function. Not present if `method` is `"bf"`.
`contrast`	Text string specifying the contrast for unordered factors that was used in fitting the model. Equal to the `contrast` argument of this function.
`method`	The method (`"lm"`, `"em"`, or `"bf"` used to fit the model. Equal to the `method` argument.
`stationary`	Logical scalar. Was a stationary Markov chain fitted? Currently (13/02/2024) `stationary` is always `TRUE`.
`data`	The data frame to which the model was fitted. It is a rearrangement of the `data` argument, with rows of that argument replicated `K` times (once for each state). A `state` column (factor) has been added, as has a column `cf` (“cell factor”), which indicates, by means of a single factor, which cell of the model a given row of `data` corresponds to. The aforementioned rearrangement consists of ordering the cells in the order of the levels of `cf`. When `distr` is `"Multinom"` the `"response"` variables are coerced into factors.
`bicm`	Numerical scalar. The number by which `npar` is multiplied to form the `BIC` criterion. It is essentially the log of the number of observations. See the code of `eglhmm()` for details.
`AIC`	Numerical scalar. The Akaike Information criterion, calculated as `-2ll + 2npar` where `ll` is the log likelihood of the fitted model and `npar` is the number of fitted parameters.
`BIC`	Numerical scalar. The Bayesian Information criterion, calculated as `-2ll + bicmnpar` where `ll` is the log likelihood of the fitted model, `npar` is the number of fitted parameters, and `bicm` is the log of the number of observations.
`missFrac`	The fraction or proportion of missing values in the observations.

Remarks

Available models:

Although this documentation refers to (extended) “generalised linear models”, the only such models currently (13/02/2024) available are the Gaussian model with the identity link, the Poisson model, with the log link, and the Binomial model with the logit link. When distr is "Dbd" or "Multinom" the model fitted is is a generalised linear model only in a rather extended sense. Even the Gaussian model is not strictly speaking a generalised linear model, since the (state dependent) standard deviations are estimated by a method separate from the generalised linear model paradigm. Other models may be added at a future date.

Decrease in the log likelihood:

If method is equal to "EM" there may be a decrease (!!!) in the log likelihood at some EM step. This is “theoretically impossible” but can occur in practice due to an intricacy in the way that the EM algorithm treats ispd when stationary is TRUE. It turns out to be effectively impossible to maximise the expected log likelihood unless the term in that quantity corresponding to ispd is ignored (whence it is ignored). Ignoring this term is “asymptotically negligible” but can have the unfortunate effect of occasionally leading to a decrease in the log likelihood. If method is equal to "em", then the object returned by eglhmm() has a component anomaly which is TRUE if such a decrease in the log likelihood was detected, and FALSE otherwise.

If such a decrease/anomaly is detected, then (provided that checkDecrLL is TRUE) the algorithm terminates and the converged component of the returned value is set equal to NA. The algorithm issues a message to the effect that the decrease occurred. The message suggests that another method be used and that perhaps the results from the penultimate EM step (which are returned by this function) be used as starting values. This of course is not possible if the response is bivariate, in which case only the EM algorithm is applicable.

Note that if checkDecrLL is FALSE, then the algorithm proceeds “normally”. That is, it treats the decrease in the log likelihood to mean that the “increase” in the log likeihood is less than tolerance and deems convergence to be achieved.

The value of checkDecrLL is set to FALSE in the function bcov() so as to speed up the rate at which the iterations proceed. In other circumstances it is probably judicious to leave it at its default value of TRUE.

Author(s)

Rolf Turner rolfturner@posteo.net

References

T. Rolf Turner, Murray A. Cameron, and Peter J. Thomson (1998). Hidden Markov chains in generalized linear models. Canadian Journal of Statististics 26, pp. 107 – 125, DOI: https://doi.org/10.2307/3315677.

Rolf Turner (2008). Direct maximization of the likelihood of a hidden Markov model. Computational Statistics and Data Analysis 52, pp. 4147 – 4160, DOI: https://doi.org/10.1016/j.csda.2008.01.029

Examples

    loc4 <- c("LngRf","BondiE","BondiOff","MlbrOff")
    SCC4 <- SydColCount[SydColCount$locn %in% loc4,] 
    SCC4$locn <- factor(SCC4$locn) # Get rid of unused levels.
    rownames(SCC4) <- 1:nrow(SCC4)
    fitP.em <- eglhmm(y~locn+depth,data=SCC4,distr="P",cells=c("locn","depth"),
                    K=2,method="em",verb=TRUE)
    ## Not run: 
        fitP.lm <- eglhmm(y~locn+depth,data=SCC4,distr="P",cells=c("locn","depth"),
                        K=2,verb=TRUE)
        fitD.lm <- eglhmm(formula=y~ma.com+nh.com+bo.com,data=SCC4,nbot=0,ntop=11,
                      cells=c("locn","depth"),distr="Dbd",K=2,method="lm",verb=TRUE,
                      tolerance=NULL)
        SCD4 <- SydColDisc[SydColDisc$locn %in% loc4,] 
        SCD4$locn <- factor(SCD4$locn) # Get rid of unused levels.
        fitM.lm  <- eglhmm(formula=y~ma.com+nh.com+bo.com,data=SCD4,
                      cells=c("locn","depth"),distr="Multinom",K=2,
                      verb=TRUE)
        xxx <- split(SCD4,f=SCD4$locn)
        X   <- with(xxx,data.frame(y.LngRf=LngRf$y,y.BondiE=BondiE$y,depth=LngRf$depth))
        fitBiv <- eglhmm(response=c("y.LngRf","y.BondiE"),data=X,K=2,cells="depth",
                         indep=FALSE,verb=TRUE)
    
## End(Not run)
# See the help for ionChannelData for more examples involving the
# ion channel data.

[Package eglhmm version 0.1-3 Index]