eglhmm {eglhmm}R Documentation

Fit (extended) generalised linear hidden Markov models.

Description

Fits an (extended) generalised linear model to a data set where the response in each “cell” of the model consists of a time series whose serial dependence is modelled by a hidden Markov model.

Usage

eglhmm(formula = NULL, response = NULL, data,
      distr = c("Gaussian", "Poisson", "Binomial", "Dbd", "Multinom", "discnp"),
      inclTau=TRUE,preSpecSigma=NULL, indep = NULL, size = NULL, nbot = NULL, ntop = NULL,
      cells = NULL, cf = "singlecell", K = NULL, par0 = NULL, randStart = NULL,
      method = c("lm", "em", "bf"), optimiser = c("optim", "nlm"),
      optimMethod = "BFGS", nlmWarn = FALSE, lmc = 10, tolerance = NULL,
      digits = NULL, verbose = FALSE, itmax = 200,
      contrast = c("treatment", "sum", "helmert"),
      crit = c("CLL", "L2", "Linf", "ABSGRD"), breaks = NULL, hessian = FALSE,
      useAnalGrad = FALSE, ca = FALSE, checkDecrLL=TRUE)

Arguments

formula

A model formula specifying the linear predictor for the model. The formula should not include state as a predictor variable. The variable state gets added to the formula automatically. Ignored if the model is bivariate, i.e. if the length of response is 2.

response

A character scalar or a length-2 vector of such scalars, specifying the name or names of the response(s). If response is not specified (i.e. if it is left as NULL) then formula (see below) must be specfied and response is taken to be the left hand side of formula. (In this case, it is of course univariate.)

data

A data frame with columns providing the response(s) and the predictor variables in the model.

distr

Character string specifying the distribution of the response(s) (“emissions”) variable(s). Currently (13/02/2024) the only distributions accommodated are Gaussian, Poisson, Binomial, Dbd, and Multinom. Note that "discnp" is just an alternative expression for "Multinom". Ignored if the response is bivariate, in which case distr is forcibly set equal to "Multinom". I.e. bivariate models are, currently, fitted only to data in which the emissions have the "Multinom" distribution.

inclTau

Logical scalar. Should the transition probability matrix parameters “tau” be included in those that are estimated via the Hessian/gradient pardigm? In this case, they are included in the set of parameters to which the gradient and Hessian are applicable. If not, they are estimated via the method of moments as is done when the EM algorithm is used. In this latter case the dimensions of the Hessian are reduced (by a substantial amount if K is “large”).

preSpecSigma

Numeric vector of length K (see below) with strictly positive entries. Ignored if distr is not equal to "Gaussian". This vector provides “pre-specified” values of the standard deviations sigma of the Gaussian distribution associated with each state. If preSpecSigma is specified, then it is used as the value of sigma throughout the fitting process, and sigma is not estimated from the data. If distr is "Gaussian" and preSpecSigma is specified, then an error will be thrown if the length of preSpecSigma is not equal to K, or if any entries of preSpecSigma fail to be strictly positive.

indep

Logical scalar; should the components of a bivariate model be considered to be independent? Ignored unless the model is bivariate (i.e. unless response is of length 2. If the model is bivariate and indep is not specified, an error is thrown.

size

Scalar integer specifying the number of trials in the experiment generating binomial data (see the size argument of dbinom()). Ignored unless distr is equal to "Binomial".

nbot

Scalar integer specifying the lower end (0 or 1) of the range of values of the discretised Beta distribution. Ignored unless distr is "Dbd".

ntop

Scalar integer specifying the upper end of the range of values of the discretised Beta distribution. Ignored unless distr is "Dbd".

cells

A character vector giving the names of the factors (columns of the data data frame) which determine what the “cells” of the model are considered to be. The cells correspond to the combinations of levels of the factors named by cells. The sequences of observations from each of the cells constitute a collection of independent time series, all following the specified model.

cf

A factor (“cell factor”) specifying the cells of the model. If cells is not specified, then cf must be. If cells is specified, then cf is ignored. the model. If cells is not specified, then in most (if not all?) circumstances, cf should be set equal factor(rep(1,nrow(data)). This the effect of making the entire observation sequence equal to a single time series, following the specified model.

K

Scalar integer specifying the number of states of the hidden Markov model in question. If K is not specified and par0 (see below) is specified, and has a component tpm, then K is set equal to nrow(tpm). In this case, if par0 does not have a tpm component, an error is thrown. An error is also thrown in this setting if K is specified to a value different from nrow(tpm).

par0

A list comprising starting values for the parameter estimates, to be used by the various methods. (See method below.) This list may have components tpm (an estimate of the transition probability matrix), phi (a vector of estimates of the coefficients in the linear predictor in the generalised linear model) and Rho (a matrix, a list of two matrices, or a three dimensional array) that specifyies the emission probabilities when distr is "Multinomial". Note that par0 may consist of an object of class "eglhmm" (see below), i.e. a model previously fitted (perhaps without achieving convergence), by eglhmm(). This provides a means whereby a fitting procedure, that failed to converge, may be continued from where it left off.

randStart

Either a logical scalar or a list of three logical scalars named tpm, phi, and Rho. If the former, it is converted internally into a list with entries named tpm, phi and Rho, all having the same value as the original argument. If tpm is TRUE then the (undocumented) function inititialise() chooses entries for the starting value of tpm at random; likewise for phi and Rho. If left NULL, this argument defaults to list(tpm=FALSE,phi=FALSE,Rho=FALSE).

method

Character string specifying the method used to fit the model. This may be "lm" (Levenberg-Marquardt algorithm), "em" (EM algorithm) or "bf" (“brute force”). The latter calls upon optim() or nlm() to do the heavy lifting). If the response is bivariate, then method is forcibly (and silently) set equal to "em".

optimiser

Character string specifying which of optim() or nlm() should be used when method is "bf". Ignored unless method is "bf".

optimMethod

Character string specifying the optimisation method to be used by optim(). See optim() for details. Ignored unless method is "bf" and optimiser is "optim".

nlmWarn

The nlm() function sometimes produces, in the first few iterations, warnings to the effect “NA/Inf replaced by maximum positive value”. These warnings are almost surely irrelevant and are annoying. If nlmWarn is FALSE (the default) then these warnings are suppressed. This argument is provided to allow for the remote possibilty that the user might want to see these warnings.

lmc

Positive numeric scalar. The initial “Levenberg-Marquardt constant”. Ignored unless method is "lm".

tolerance

Positive numeric scalar. The convergence tolerance to be used. What this value actually means depends upon method. If left as NULL it defaults to 1e-6 for the bivariate methods, to sqrt(.Machine$double.eps for the "em" and "lm" methods, and to the default value of reltol used by optim() when method is "bf" and optimiser is "optim". It is ignored if method is "bf" and optimiser is "nlm".

digits

Integer scalar. The number of digits to which “progress reports” are printed when verbose (see below) is TRUE. There is a “sensible” default which is calculated in terms of tolerance. This argument is ignored if method is "bf".

verbose

Logical scalar; if TRUE, rudimentary “progress reports” are printed out at appropriate points during the iteration process. The nature of these “reports” varies with method.

itmax

Integer scalar. The maximum number of iterative steps to take. Has a somewhat different meaning when method is "bf", in which case the meaning depends on optimiser. For methods "em" and "lm", if convergence is not achieved by itmax steps, the function gives up, prints a message to this effect, and returns a value with a component converged=FALSE. This returned value may be used as a starting (the value of the argument par0) so that the iterations may be continued from where they left off. Unfortunately this facility is not available when method is "bf".

contrast

Text string specifying the contrast (in respect of unordered factors) (see contrasts() and options()) that will be used when the design matrix is constructed from the model formula. May be abbreviated (e.g. to "t", "s" or "h").

crit

Text string specifying the stopping criterion to be used. Possible values are “CLL” (scaled change in log likelihood), “L2” (scaled square root of the sum of squares of the changes in the parameter estimates), “Linf” (scaled maximum of the absolute value of the changes in the parameter estimates), and “ABSGRD” (scaled maximum of the absolute values of the entries of the gradient vector). The latter only makes sense for the Levenberg-Marquardt algorithm.

This argument is ignored if method is "bf". It seems that the "bf" method effectively uses “CLL” when optimiser is "optim". When optimiser is "nlm" it seems that a combination of (something like) “ABSGRD” and “CLL” is used.

breaks

A vector of K+1 values used to construct a set of guesses at the states corresponding to each observation. These are in turn used to calculate an initial estimate of the transition probability matrix. There is a “sensible” default (produced by the undocumented function breaker().

hessian

Logical scalar; should a Hessian matrix obtained by numerical differentiation be returned? Ignored unless method is "bf".

useAnalGrad

Logical scalar; should “analytical” calculation of the gradient be conducted? This argument is ignored unless the method is "bf".

ca

Logical scalar; “check analyticals”. Used only when the method is "bf" and optimiser is "nlm", and is passed on to nlm().

checkDecrLL

Logical scalar; “check for a decrease in the log likelihood”. Ignored unless the method is "em". Should the software check for a decrease in the log likelihood after an EM step? See the Remarks for further discussion.

Value

An object of class "eglhmm", consisting of a list with components:

call

The call by which this object was created. Present so that update() can be applied to objects returned by eglhmm().

tpm

The estimated transition probability matrix.

ispd

The estimated initial state probability distribution.

phi

Except for the "Multinom" distribution this is the vector of estimated coefficients of the linear predictor in the generalised linear model. For the "Multinom" distribution it consists of the entries of Rho (see below) with the final all-zero column remove. In this case phi is of course redundant.

theta

The vector of parameter estimates that the estimation procedure actually works with. It consists of the catenation of the non-redundant parameterization of the transition probability matrix and the vector phi. It is redundant in the case of the "Multinom" distribution.

Rho

A matrix, or a list of two matrices or a three dimensional array specifying the emissions probabilities for a multinomial distribution. Present only if distr is "Multinom".

log.like

The value of the log likelihood of the model evaluated at the parameter estimates, i.e. the (approximately) maximal value of the log likelihood.

gradient

(Not present for the "em" method.) The gradient vector of the log likelihood at the final parameter estimates; it should be effectively the zero vector.

numHess

(Present only if method is "bf" and only if the argument hessian is TRUE.) A value of the Hessian matrix (see below), obtained by means of numerical differentiation.

Hessian

(Present only if method is "lm".) The Hessian matrix, i.e. the matrix of second partial derivatives of the log likelihood, evaluated at the final parameter estimates. The inverse of the negative of this matrix constitutes an estimate of the covariance matrix of the parameter estimates.

mu

A data frame with npred+1 columns where npred is the number of predictors in the model. The rows contain, in their first npred entries, all possible combinations of the predictor values. The last (npred+1) entry of each row is the fitted mean of the Gaussian distribution, as determined by that combination of predictors. Present only if distr is "Gaussian".

sigma

Numeric vector of length K whose entries consist of the fitted standard deviations for the underlying Gaussian distribution, corresponding to each of the states. Present only if distr is "Gaussian" and preSpecSigma is not supplied.

preSpecSigma

Numeric vector equal to the preSpecSigma argument, with names "sigma1", "sigma2", ..., "sigmaK" added. Present only if distr is "Gaussian" and preSpecSigma is supplied.

stopCritVal

Numeric scalar equal to the value, assumed by the stopping criterion specified by the argument crit, at the termination of the algorithm. If the algorithm converged then stopCritVal will be less than tolerance. Not present if method is "bf". If converged (see below) is NA then stopCritVal is NA also.

anomaly

Logical scalar. Did an “anomaly” occur in an application of the EM algorithm? (See Remarks.) ' Present only if method was equal to "em". This entry of the returned value is provided mainly for use by the bcov() function. Note that anomaly is added to the returned object, irrespective of the value of checkDecrLL. When checkDecrLL is TRUE, anomaly is somewhat redundant, since it will be TRUE if aand only if converged is NA. However when checkDecrLL is FALSE, anomaly is informtive, since it is not possible to tell from other entries of the returned value when an anomaly has occurred.

converged

A logical scalar. For the "lm", and "em" methods it is TRUE if convergence is achieved within itmax iterations and FALSE otherwise. For the "em" method, if checkDecrLL is TRUE, then converged may be NA. See Remarks for some discussion.

For the "bf" method converged is TRUE if the convergence component of the object returned by optim() is equal to 0 or if the code component of the object returned by nlm() is less than or equal to 2, and is FALSE otherwise. When nlm() is used, the value of converged has an attribute "code" equal to the actual value of the code component.

nstep

The number of steps (iterations) actually used by the algorithm. For the "lm" and "em" methods this is the number of Levenberg-Marquardt steps, or EM steps, respectively, taken by the algorithm. For the "bf" method it is the counts component of the object returned by optim() when optimiser is "optim" and it is the iterations component of the object returned by nlm() when optimiser is "nlm".

mean

A vector of the fitted mean values underlying each combination of observed predictors and state (i.e. corresponding to each entry of y in the data frame used to fit the model. See the description of data below. Present only if distr is "Gaussian".

sd

A vector of the fitted values of the standard deviations underlying each combination of observed predictors and state, i.e. corresponding to each entry of y in the data frame used to fit the model. See the description of data below. Present only if distr is "Gaussian".

lambda

A vector of estimated values of the Poisson parameter associated with each combination of observed predictors and state, i.e. corresponding to each entry of y in the data frame used to fit the model. See the description of data below. Present only if distr is "Poisson".

p

A vector of estimated values of the “success” probabilities associated with each combination of observed predictors and state, i.e. corresponding to each entry of y in the data frame used to fit the model. See the description of data below. Present only if distr is "Binomial".

alpha

A numeric vector of the fitted “alpha” parameters, of the discretised Beta distribution, corresponding to each observation. Present only if distr is "Dbd".

beta

A numeric vector of the fitted “beta” parameters, of the discretised Beta distribution, corresponding to each observation. Present only if distr is "Dbd".

fy

The values of the “emission probability (density)” function, calculated at each observed value, for each state (i.e. at each entry of y in data. See below.) These values are calculated using the (final) fitted parameters.

message

A (long) text string that is produced if the EM algorithm encounters the anomaly of a decrease in the log likelihood after an EM step. It warns the user that this has occurred and suggests consulting the help file for an explanation. Present only if method=="em", the anomaly referred to has occurred, and checkDecrLL is TRUE.

par0

The starting values used in the estimation procedure. Either those provided by the argument par0 or those created by the (undocumented) function initialise.

cells

A character vector indicating the names of the factors specifying the “cells” of the model. (Equal to the cells argument.)

formula

The formula for the model that was fitted; equal to the formula argument, augmented by state.

distr

Text string specifying the distribution of the response variable. Equal to the distr argument of this function.

nbot

Integer scalar. The lower endpoint of the range of values of the discretised beta distribution. Equal to the value of the nbot argument of this function. Present only if distr is "Dbd".

ntop

Integer scalar. The upper endpoint of the range of values of the discretised beta distribution. Equal to the value of the nbot argument of this function. Present only if distr is "Dbd".

size

Scalar integer equal to the number of trials in the “experiments” generating the data. Equal to the size argument of this function. Present only if distr is "Binomial".

tolerance

The convergence tolerance used to fit the model. Equal to the tolerance argument.

crit

Character scalar specifying the stopping criterion that was used. Equal to the crit argument of this function. Not present if method is "bf".

contrast

Text string specifying the contrast for unordered factors that was used in fitting the model. Equal to the contrast argument of this function.

method

The method ("lm", "em", or "bf" used to fit the model. Equal to the method argument.

stationary

Logical scalar. Was a stationary Markov chain fitted? Currently (13/02/2024) stationary is always TRUE.

data

The data frame to which the model was fitted. It is a rearrangement of the data argument, with rows of that argument replicated K times (once for each state). A state column (factor) has been added, as has a column cf (“cell factor”), which indicates, by means of a single factor, which cell of the model a given row of data corresponds to. The aforementioned rearrangement consists of ordering the cells in the order of the levels of cf. When distr is "Multinom" the "response" variables are coerced into factors.

bicm

Numerical scalar. The number by which npar is multiplied to form the BIC criterion. It is essentially the log of the number of observations. See the code of eglhmm() for details.

AIC

Numerical scalar. The Akaike Information criterion, calculated as -2*ll + 2*npar where ll is the log likelihood of the fitted model and npar is the number of fitted parameters.

BIC

Numerical scalar. The Bayesian Information criterion, calculated as -2*ll + bicm*npar where ll is the log likelihood of the fitted model, npar is the number of fitted parameters, and bicm is the log of the number of observations.

missFrac

The fraction or proportion of missing values in the observations.

Remarks

Available models:

Although this documentation refers to (extended) “generalised linear models”, the only such models currently (13/02/2024) available are the Gaussian model with the identity link, the Poisson model, with the log link, and the Binomial model with the logit link. When distr is "Dbd" or "Multinom" the model fitted is is a generalised linear model only in a rather extended sense. Even the Gaussian model is not strictly speaking a generalised linear model, since the (state dependent) standard deviations are estimated by a method separate from the generalised linear model paradigm. Other models may be added at a future date.

Decrease in the log likelihood:

If method is equal to "EM" there may be a decrease (!!!) in the log likelihood at some EM step. This is “theoretically impossible” but can occur in practice due to an intricacy in the way that the EM algorithm treats ispd when stationary is TRUE. It turns out to be effectively impossible to maximise the expected log likelihood unless the term in that quantity corresponding to ispd is ignored (whence it is ignored). Ignoring this term is “asymptotically negligible” but can have the unfortunate effect of occasionally leading to a decrease in the log likelihood. If method is equal to "em", then the object returned by eglhmm() has a component anomaly which is TRUE if such a decrease in the log likelihood was detected, and FALSE otherwise.

If such a decrease/anomaly is detected, then (provided that checkDecrLL is TRUE) the algorithm terminates and the converged component of the returned value is set equal to NA. The algorithm issues a message to the effect that the decrease occurred. The message suggests that another method be used and that perhaps the results from the penultimate EM step (which are returned by this function) be used as starting values. This of course is not possible if the response is bivariate, in which case only the EM algorithm is applicable.

Note that if checkDecrLL is FALSE, then the algorithm proceeds “normally”. That is, it treats the decrease in the log likelihood to mean that the “increase” in the log likeihood is less than tolerance and deems convergence to be achieved.

The value of checkDecrLL is set to FALSE in the function bcov() so as to speed up the rate at which the iterations proceed. In other circumstances it is probably judicious to leave it at its default value of TRUE.

Author(s)

Rolf Turner rolfturner@posteo.net

References

T. Rolf Turner, Murray A. Cameron, and Peter J. Thomson (1998). Hidden Markov chains in generalized linear models. Canadian Journal of Statististics 26, pp. 107 – 125, DOI: https://doi.org/10.2307/3315677.

Rolf Turner (2008). Direct maximization of the likelihood of a hidden Markov model. Computational Statistics and Data Analysis 52, pp. 4147 – 4160, DOI: https://doi.org/10.1016/j.csda.2008.01.029

See Also

fitted.eglhmm() reglhmm.default() reglhmm.eglhmm() bcov()

Examples

    loc4 <- c("LngRf","BondiE","BondiOff","MlbrOff")
    SCC4 <- SydColCount[SydColCount$locn %in% loc4,] 
    SCC4$locn <- factor(SCC4$locn) # Get rid of unused levels.
    rownames(SCC4) <- 1:nrow(SCC4)
    fitP.em <- eglhmm(y~locn+depth,data=SCC4,distr="P",cells=c("locn","depth"),
                    K=2,method="em",verb=TRUE)
    ## Not run: 
        fitP.lm <- eglhmm(y~locn+depth,data=SCC4,distr="P",cells=c("locn","depth"),
                        K=2,verb=TRUE)
        fitD.lm <- eglhmm(formula=y~ma.com+nh.com+bo.com,data=SCC4,nbot=0,ntop=11,
                      cells=c("locn","depth"),distr="Dbd",K=2,method="lm",verb=TRUE,
                      tolerance=NULL)
        SCD4 <- SydColDisc[SydColDisc$locn %in% loc4,] 
        SCD4$locn <- factor(SCD4$locn) # Get rid of unused levels.
        fitM.lm  <- eglhmm(formula=y~ma.com+nh.com+bo.com,data=SCD4,
                      cells=c("locn","depth"),distr="Multinom",K=2,
                      verb=TRUE)
        xxx <- split(SCD4,f=SCD4$locn)
        X   <- with(xxx,data.frame(y.LngRf=LngRf$y,y.BondiE=BondiE$y,depth=LngRf$depth))
        fitBiv <- eglhmm(response=c("y.LngRf","y.BondiE"),data=X,K=2,cells="depth",
                         indep=FALSE,verb=TRUE)
    
## End(Not run)
# See the help for ionChannelData for more examples involving the
# ion channel data.

[Package eglhmm version 0.1-3 Index]