Mstep {HiddenMarkov} | R Documentation |
M-Step of EM Algorithm for DTHMM
Description
Performs the maximisation step of the EM algorithm for a dthmm
process. This function is called by the BaumWelch
function. The Baum-Welch algorithm used in the HMM literature is a version of the EM algorithm.
Usage
Mstep.beta(x, cond, pm, pn, maxiter = 200)
Mstep.binom(x, cond, pm, pn)
Mstep.exp(x, cond, pm, pn)
Mstep.gamma(x, cond, pm, pn, maxiter = 200)
Mstep.glm(x, cond, pm, pn, family, link)
Mstep.lnorm(x, cond, pm, pn)
Mstep.logis(x, cond, pm, pn, maxiter = 200)
Mstep.norm(x, cond, pm, pn)
Mstep.pois(x, cond, pm, pn)
Arguments
x |
is a vector of length |
cond |
is an object created by |
family |
character string, the GLM family, one of |
link |
character string, the link function. If |
pm |
is a list object containing the current (Markov dependent) parameter estimates associated with the distribution of the observed process (see |
pn |
is a list object containing the observation dependent parameter values associated with the distribution of the observed process (see |
maxiter |
maximum number of Newton-Raphson iterations. |
Details
The functions Mstep.beta
, Mstep.binom
, Mstep.exp
, Mstep.gamma
, Mstep.lnorm
, Mstep.logis
, Mstep.norm
and Mstep.pois
perform the maximisation step for the Beta, Binomial, Exponential, Gamma, Log Normal, Logistic, Normal and Poisson distributions, respectively. Each function has the same argument list, even if specific arguments are redundant, because the functions are called from within other functions in a generic like manner. Specific notes for some follow.
Mstep.beta
The R functions for the
Beta
Distribution have argumentsshape1
,shape2
andncp
. We only useshape1
andshape2
, i.e.ncp
is assumed to be zero. Different combinations of"shape1"
and"shape2"
can be “time” dependent (specified inpn
) and Markov dependent (specified inpm
). However, each should only be specified in one (see topicdthmm
).Mstep.binom
The R functions for the
Binomial
Distribution have argumentssize
andprob
. Thesize
argument of theBinomial
Distribution should always be specified in thepn
argument (see topicdthmm
).Mstep.gamma
The R functions for the
GammaDist
have argumentsshape
,rate
andscale
. Sincescale
is redundant, we only useshape
andrate
. Different combinations of"shape"
and"rate"
can be “time” dependent (specified inpn
) and Markov dependent (specified inpm
). However, each should only be specified in one (see topicdthmm
).Mstep.lnorm
The R functions for the
Lognormal
Distribution have argumentsmeanlog
andsdlog
. Different combinations of"meanlog"
and"sdlog"
can be “time” dependent (specified inpn
) and Markov dependent (specified inpm
). However, each should only be specified in one (see topicdthmm
).Mstep.logis
The R functions for the
Logistic
Distribution have argumentslocation
andscale
. Different combinations of"location"
and"scale"
can be “time” dependent (specified inpn
) and Markov dependent (specified inpm
). However, each should only be specified in one (see topicdthmm
).Mstep.norm
The R functions for the
Normal
Distribution have argumentsmean
andsd
. Different combinations of"mean"
and"sd"
can be “time” dependent (specified inpn
) and Markov dependent (specified inpm
). However, each should only be specified in one (see topicdthmm
).
Value
A list object with the same structure as pm
(see topic dthmm
).
Modifications and Extensions
The HiddenMarkov package calls the associated functions belonging to the specified probability distribution in a generic way. For example, if the argument distn
in dthmm
is "xyz"
, it will expect to find functions pxyz
, dxyz
, and Mstep.xyz
. And if simulations are to be performed, it will require rxyz
. In this section we describe the required format for the distribution related functions pxyz
, dxyz
, and rxyz
; and for the function Mstep.xyz
required for the M-step in the EM algorithm.
Consider the examples below of distribution related functions and their arguments. Note that the probability functions all have a first argument of q
, and the last two arguments are all the same, with the same default values. Similarly, the density functions have a first argument of x
, and the last argument is the same, with the same defaults. The arguments in the middle are peculiar to the given distribution, one argument for each distribution parameter. Note that the observed process x
is univariate.
pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) pbeta(q, shape1, shape2, ncp = 0, lower.tail = TRUE, log.p = FALSE) ppois(q, lambda, lower.tail = TRUE, log.p = FALSE) pbinom(q, size, prob, lower.tail = TRUE, log.p = FALSE) dnorm(x, mean = 0, sd = 1, log = FALSE) dbeta(x, shape1, shape2, ncp = 0, log = FALSE) dpois(x, lambda, log = FALSE) dbinom(x, size, prob, log = FALSE) rnorm(n, mean = 0, sd = 1) rbeta(n, shape1, shape2, ncp = 0) rpois(n, lambda) rbinom(n, size, prob)
The functions pxyz
(distribution function), dxyz
(density) and rxyz
(random number generator) must be consistent with the conventions used in the above examples. The software will deduce the distribution argument names from what is specified in pm
and pn
, and it will call these functions assuming that their argument list is consistent with those described above. The functions pxyz
and dxyz
are used in the forward and backward equations.
The functions dxyz
, pxyz
and rxyz
must also behave in the same vectorised way as dnorm
. For example, if x
is a vector, and mean
and sd
are scalars, then dnorm(x, mean, sd)
calculates the density for each element in x
using the scalar values of mean
and sd
; thus the returned value is the same length as x
. Alternatively, if x
is a scalar and mean
and sd
are vectors, both of the same length, then the returned value is the same length as mean
and is the density of x
evaluated at the corresponding pairs of values of mean
and sd
. The third possibility is that x
and one of the distribution parameters, say sd
, are vectors of the same length, and mu
is a scalar. Then the returned vector will be of the same length as x
, where the i
th value is the density at x[i]
with mean mean
and standard deviation sd[i]
. Note that the functions for the Multinomial
distribution do not have this behaviour. Here the vector x
contains the counts for one multinomial experiment, so the vector is used to characterise the multivariate character of the random variable rather than multiple univariate realisations. Further, the distribution parameters (i.e. category probabilities) are characterised as one vector rather than a sequence of separate function arguments.
The other calculation, that is specific to the chosen distribution, is the maximisation in the M-step. If we have distn="xyz"
, then there should be a function called Mstep.xyz
. Further, it should have arguments (x, cond, pm, pn)
; see for example Mstep.norm
. The parameters that are estimated within this function are named in a consistent way with those that are defined within the dthmm
arguments pm
and pn
. Notice that the estimates of mean
and sd
in Mstep.norm
are weighted by cond$u
. The calculations for cond$u
are performed in the E-step, and utilise the distribution related functions "dxyz"
and "pxyz"
. The values of cond$u
are essentially probabilities that the process belongs to the given Markov state, hence, when we calculate the distributional parameters (like mu
and sd
in Mstep.norm
) we calculate weighted sums using the probabilities cond$u
. This procedure can be shown to give the maximum likelihood estimates of mu
and sd
, and hence a similar weighting should be used for the distribution "xyz"
(see Harte, 2006, for further mathematical detail). One needs to take a little more care when dealing with a distributions like the beta, where the cross derivatives of the log likelihood between the parameters, i.e. \partial^2 \log L /(\partial \alpha_1 \partial \alpha_2)
are non-zero. See Mstep.beta
for further details.
Now consider a situation where we want to modify the way in which a normal distribution is fitted. Say we know the Markov dependent means, and we only want to estimate the standard deviations. Since both parameters are Markov dependent, they both need to be specified in the pm
argument of dthmm
. The estimation of the distribution specific parameters takes place in the M-step, in this case Mstep.norm
. To achieve what we want, we need to modify this function. In this case it is relatively easy (see code in “Examples” below). From the function Mstep.norm
, take the code under the section if (all(nms==c("mean", "sd")))
, i.e. both of the parameters are Markov dependent. However, replace the line where the mean is estimated to mean <- pm$mean
, i.e. leave it as was initially specified. Unfortunately, one cannot easily modify the functions in a package namespace. The simple work-around here is to define a new distribution, say "xyz"
, then define a new function with the above changes called Mstep.xyz
. However, the distribution related functions are just the same as those for the normal distribution, hence, define them as follows:
rxyz <- rnorm dxyz <- dnorm pxyz <- pnorm qxyz <- qnorm
See the 2nd example below for full details.
Note
The Mstep functions can be used to estimate the maximum likelihood parameters from a simple sample. See the example below where this is done for the logistic distribution.
See Also
Examples
# Fit logistic distribution to a simple single sample
# Simulate data
n <- 20000
location <- -2
scale <- 1.5
x <- rlogis(n, location, scale)
# give each datum equal weight
cond <- NULL
cond$u <- matrix(rep(1/n, n), ncol=1)
# calculate maximum likelihood parameter estimates
# start iterations at values used to simulate
print(Mstep.logis(x, cond,
pm=list(location=location,
scale=scale)))
#-----------------------------------------------------
# Example with Gaussian Observations
# Assume that both mean and sd are Markov dependent, but the means
# are known and sd requires estimation (See "Modifications" above).
# One way is to define a new distribution called "xyz", say.
Mstep.xyz <- function(x, cond, pm, pn){
# this function is a modified version of Mstep.norm
# here the mean is fixed to the values specified in pm$mean
nms <- sort(names(pm))
n <- length(x)
m <- ncol(cond$u)
if (all(nms==c("mean", "sd"))){
mean <- pm$mean
sd <- sqrt(apply((matrix(x, nrow=n, ncol=m) -
matrix(mean,
nrow=n, ncol=m, byrow=TRUE))^2 * cond$u, MARGIN=2,
FUN=sum)/apply(cond$u, MARGIN=2, FUN=sum))
return(list(mean=mean, sd=sd))
}
}
# define the distribution related functions for "xyz"
# they are the same as those for the Normal distribution
rxyz <- rnorm
dxyz <- dnorm
pxyz <- pnorm
qxyz <- qnorm
Pi <- matrix(c(1/2, 1/2, 0,
1/3, 1/3, 1/3,
0, 1/2, 1/2),
byrow=TRUE, nrow=3)
p1 <- c(1, 6, 3)
p2 <- c(0.5, 1, 0.5)
n <- 1000
pm <- list(mean=p1, sd=p2)
x <- dthmm(NULL, Pi, c(0, 1, 0), "xyz", pm, discrete=FALSE)
x <- simulate(x, n, seed=5)
# use above parameter values as initial values
y <- BaumWelch(x)
print(y$delta)
print(y$pm)
print(y$Pi)