BaumWelch {HiddenMarkov} | R Documentation |
Estimation Using Baum-Welch Algorithm
Description
Estimates the parameters of a hidden Markov model. The Baum-Welch algorithm (Baum et al, 1970) referred to in the HMM literature is a version of the EM algorithm (Dempster et al, 1977). See Hartley (1958) for an earlier application of the EM methodology, though not referred to as such.
Usage
BaumWelch(object, control, ...)
## S3 method for class 'dthmm'
BaumWelch(object, control = bwcontrol(), ...)
## S3 method for class 'mmglm0'
BaumWelch(object, control = bwcontrol(), ...)
## S3 method for class 'mmglm1'
BaumWelch(object, control = bwcontrol(), ...)
## S3 method for class 'mmglmlong1'
BaumWelch(object, control = bwcontrol(), PSOCKcluster=NULL,
tmpfile=NULL, ...)
## S3 method for class 'mmpp'
BaumWelch(object, control = bwcontrol(), ...)
Arguments
object |
an object of class |
control |
a list of control settings for the iterative process. These can be changed by using the function |
PSOCKcluster |
see section below called “Parallel Processing”. |
tmpfile |
name of a file (.Rda) into which estimates are written at each 10th iteration. The model object is called |
... |
other arguments. |
Details
The initial parameter values used by the EM algorithm are those that are contained within the input object
.
The code for the methods "dthmm"
, "mmglm0"
, "mmglm1"
,"mmglmlong1"
and "mmpp"
can be viewed by appending BaumWelch.dthmm
, BaumWelch.mmglm0
, BaumWelch.mmglm1
, BaumWelch.mmglmlong1
or BaumWelch.mmpp
, respectively, to HiddenMarkov:::
, on the R command line; e.g. HiddenMarkov:::dthmm
. The three colons are needed because these method functions are not in the exported NAMESPACE.
Value
The output object (a list
) with have the same class as the input, and will have the same components. The parameter values will be replaced by those estimated by this function. The object will also contain additional components.
An object of class "dthmm"
will also contain
u |
an |
v |
an |
LL |
value of log-likelihood at the end. |
iter |
number of iterations performed. |
diff |
difference between final and previous log-likelihood. |
Parallel Processing
In longitudinal models, the forward and backward equations need to be calculated for each individual subject. These can be done independently, the results being concatenated to be used in the E-step. If the argument PSOCKcluster
is set, subjects are divided equally between each node in the cluster for the calculation of the forward and backward equations. This division is very basic, and assumes that all nodes run at a roughly comparable speed.
If the communication between nodes is slow and the dataset is small, then the time taken to allocate the work to the various nodes may in fact take more time than simply using one processor to perform all of the calculations.
The required steps in initiating parallel processing are as follows.
# load the "parallel" package library(parallel) # define the SNOW cluster object, e.g. a SOCK cluster # where each node has the same R installation. cl <- makePSOCKcluster(c("localhost", "horoeka.localdomain", "horoeka.localdomain", "localhost")) # A more general setup: Totara is Fedora, Rimu is Debian: # Use 2 processors on Totara, 1 on Rimu: totara <- list(host="localhost", rscript="/usr/lib/R/bin/Rscript", snowlib="/usr/lib/R/library") rimu <- list(host="rimu.localdomain", rscript="/usr/lib/R/bin/Rscript", snowlib="/usr/local/lib/R/site-library") cl <- makeCluster(list(totara, totara, rimu), type="SOCK") # then define the required model object # say the model object is called x BaumWelch(x, PSOCKcluster=cl) # stop the R jobs on the slave machines stopCluster(cl)
Note that the communication method does not need to be SOCKS
; see the parallel package documentation, topic makeCluster
, for other options. Further, if some nodes are on other machines, the firewalls may need to be tweaked. The master machine initiates the R jobs on the slave machines by communicating through port 22 (use of security keys are needed rather than passwords), and subsequent communications through port 10187. Again, these details can be tweaked in the options settings within the parallel package.
References
Cited references are listed on the HiddenMarkov manual page.
See Also
logLik
, residuals
, simulate
, summary
, neglogLik