BaumWelch {HiddenMarkov}R Documentation

Estimation Using Baum-Welch Algorithm

Description

Estimates the parameters of a hidden Markov model. The Baum-Welch algorithm (Baum et al, 1970) referred to in the HMM literature is a version of the EM algorithm (Dempster et al, 1977). See Hartley (1958) for an earlier application of the EM methodology, though not referred to as such.

Usage

BaumWelch(object, control, ...)
## S3 method for class 'dthmm'
BaumWelch(object, control = bwcontrol(), ...)
## S3 method for class 'mmglm0'
BaumWelch(object, control = bwcontrol(), ...)
## S3 method for class 'mmglm1'
BaumWelch(object, control = bwcontrol(), ...)
## S3 method for class 'mmglmlong1'
BaumWelch(object, control = bwcontrol(), PSOCKcluster=NULL,
          tmpfile=NULL, ...)
## S3 method for class 'mmpp'
BaumWelch(object, control = bwcontrol(), ...)

Arguments

object

an object of class "dthmm", "mmglm0", "mmglm1", "mmglmlong1", or "mmpp".

control

a list of control settings for the iterative process. These can be changed by using the function bwcontrol.

PSOCKcluster

see section below called “Parallel Processing”.

tmpfile

name of a file (.Rda) into which estimates are written at each 10th iteration. The model object is called object. If NULL (default), no file is created.

...

other arguments.

Details

The initial parameter values used by the EM algorithm are those that are contained within the input object.

The code for the methods "dthmm", "mmglm0", "mmglm1","mmglmlong1" and "mmpp" can be viewed by appending BaumWelch.dthmm, BaumWelch.mmglm0, BaumWelch.mmglm1, BaumWelch.mmglmlong1 or BaumWelch.mmpp, respectively, to HiddenMarkov:::, on the R command line; e.g. HiddenMarkov:::dthmm. The three colons are needed because these method functions are not in the exported NAMESPACE.

Value

The output object (a list) with have the same class as the input, and will have the same components. The parameter values will be replaced by those estimated by this function. The object will also contain additional components.

An object of class "dthmm" will also contain

u

an n \times m matrix containing estimates of the conditional expectations. See “Details” in Estep.

v

an n \times m \times m array containing estimates of the conditional expectations. See “Details” in Estep.

LL

value of log-likelihood at the end.

iter

number of iterations performed.

diff

difference between final and previous log-likelihood.

Parallel Processing

In longitudinal models, the forward and backward equations need to be calculated for each individual subject. These can be done independently, the results being concatenated to be used in the E-step. If the argument PSOCKcluster is set, subjects are divided equally between each node in the cluster for the calculation of the forward and backward equations. This division is very basic, and assumes that all nodes run at a roughly comparable speed.

If the communication between nodes is slow and the dataset is small, then the time taken to allocate the work to the various nodes may in fact take more time than simply using one processor to perform all of the calculations.

The required steps in initiating parallel processing are as follows.

#   load the "parallel" package
library(parallel)

#   define the SNOW cluster object, e.g. a SOCK cluster
#   where each node has the same R installation.
cl <- makePSOCKcluster(c("localhost", "horoeka.localdomain", 
                         "horoeka.localdomain", "localhost"))

#   A more general setup: Totara is Fedora, Rimu is Debian:
#   Use 2 processors on Totara, 1 on Rimu:
totara  <- list(host="localhost",
                rscript="/usr/lib/R/bin/Rscript",
                snowlib="/usr/lib/R/library")
rimu    <- list(host="rimu.localdomain",
                rscript="/usr/lib/R/bin/Rscript",
                snowlib="/usr/local/lib/R/site-library")
cl <- makeCluster(list(totara, totara, rimu), type="SOCK")

#   then define the required model object
#   say the model object is called x
BaumWelch(x, PSOCKcluster=cl)

#   stop the R jobs on the slave machines
stopCluster(cl)

Note that the communication method does not need to be SOCKS; see the parallel package documentation, topic makeCluster, for other options. Further, if some nodes are on other machines, the firewalls may need to be tweaked. The master machine initiates the R jobs on the slave machines by communicating through port 22 (use of security keys are needed rather than passwords), and subsequent communications through port 10187. Again, these details can be tweaked in the options settings within the parallel package.

References

Cited references are listed on the HiddenMarkov manual page.

See Also

logLik, residuals, simulate, summary, neglogLik


[Package HiddenMarkov version 1.8-13 Index]