dthmm {HiddenMarkov} | R Documentation |
Discrete Time HMM Object (DTHMM)
Description
Creates a discrete time hidden Markov model object with class "dthmm"
. The observed process is univariate.
Usage
dthmm(x, Pi, delta, distn, pm, pn = NULL, discrete = NULL,
nonstat = TRUE)
Arguments
x |
is a vector of length |
Pi |
is the |
delta |
is the marginal probability distribution of the |
distn |
is a character string with the abbreviated distribution name. Distributions provided by the package are |
pm |
is a list object containing the (Markov dependent) parameter values associated with the distribution of the observed process (see below). |
pn |
is a list object containing the observation dependent parameter values associated with the distribution of the observed process (see below). |
discrete |
is logical, and is |
nonstat |
is logical, |
Value
A list
object with class "dthmm"
, containing the above arguments as named components.
Notation
MacDonald & Zucchini (1997) use
t
to denote the time, wheret = 1, \cdots, T
. To avoid confusion with other uses oft
andT
in R, we usei = 1, \cdots, n
.We denote the observed sequence as
\{X_i\},\ i = 1, \cdots, n
; and the hidden Markov chain as\{C_i\},\ i = 1, \cdots, n
.The history of the observed process up to time
i
is denoted byX^{(i)}
, i.e.X^{(i)} = (X_1, \cdots, X_i)
where
i = 1, \cdots, n
. Similarly forC^{(i)}
.The hidden Markov chain has
m
states denoted by1, \cdots, m
.The Markov chain transition probability matrix is denoted by
\Pi
, where the(j, k)
th element is\pi_{jk} = \Pr\{ C_{i+1}=k \, | \, C_i=j \}
for all
i
(i.e. all time points), andj,k = 1, \cdots, m
.The Markov chain is assumed to be homogeneous, i.e. for each
j
andk
,\pi_{jk}
is constant over time.The Markov chain is said to be stationary if the marginal distribution is the same over time, i.e. for each
j
,\delta_j = \Pr\{ C_i = j \}
is constant for alli
. The marginal distribution is denoted by\delta = (\delta_1, \cdots, \delta_m)
.
List Object pm
The list object pm
contains parameter values for the probability distribution of the observed process that are dependent on the hidden Markov state. These parameters are generally required to be estimated. See “Modifications” in topic Mstep
when some do not require estimation.
Assume that the hidden Markov chain has m
states, and that there are \ell
parameters that are dependent on the hidden Markov state. Then the list object pm
should contain \ell
named vector components each of length m
. The names are determined by the required probability distribution.
For example, if distn == "norm"
, the arguments names must coincide with those used by the functions dnorm
or rnorm
, which are mean
and sd
. Each must be specified in either pm
or pn
. If they both vary according to the hidden Markov state then pm
should have the named components mean
and sd
. These are both vectors of length m
containing the means and standard deviations of the observed process when the hidden Markov chain is in each of the m
states. If, for example, sd
was “time” dependent, then sd
would be contained in pn
(see below).
If distn == "pois"
, then pm
should have one component named lambda
, being the parameter name in the function dpois
. Even if there is only one parameter, the vector component should still be within a list and named.
List Object pn
The list object pn
contains parameter values of the probability distribution for the observed process that are dependent on the observation number or “time”. These parameters are assumed to be known.
Assume that the observed process is of length n
, and that there are \ell
parameters that are dependent on the observation number or time. Then the list object pn
should contain \ell
named vector components each of length n
. The names, as in pm
, are determined by the required probability distribution.
For example, in the observed process we may count the number of successes in a known number of Bernoulli trials, i.e. the number of Bernoulli trials is known at each time point, but the probability of success varies according to a hidden Markov state. The prob
parameter of rbinom
(or dbinom
) would be specified in pm
and the size
parameter would specified in pn
.
One could also have a situation where the observed process was Gaussian, with the means varying according to the hidden Markov state, but the variances varying non-randomly according to the observation number (or vice versa). Here mean
would be specified within pm
and sd
within pn
. Note that a given parameter can only occur within one of pm
or pn
.
Complete Data Likelihood
The “complete data likelihood”, L_c
, is
L_c = \Pr\{ X_1=x_1, \cdots, X_n=x_n, C_1=c_1, \cdots, C_n=c_n \}\,.
This can be shown to be
\Pr\{ X_1=x_1 \,|\, C_1=c_1 \} \Pr\{ C_1=c_1 \} \prod_{i=2}^n \Pr\{ X_i=x_i \,|\, C_i=c_i \} \Pr\{ C_i=c_i \,|\, C_{i-1}=c_{i-1} \}\,,
and hence, substituting model parameters, we get
L_c = \delta_{c_1} \pi_{c_1c_2} \pi_{c_2c_3} \cdots \pi_{c_{n-1}c_n} \prod_{i=1}^n \Pr\{ X_i=x_i \,|\, C_i=c_i \}\,,
and so
\log L_c = \log \delta_{c_1} + \sum_{i=2}^n \log \pi_{c_{i-1}c_i} + \sum_{i=1}^n \log \Pr\{ X_i=x_i \,|\, C_i=c_i \}\,.
Hence the “complete data likelihood” is split into three terms: the first relates to parameters of the marginal distribution (Markov chain), the second to the transition probabilities, and the third to the distribution parameters of the observed random variable. When the Markov chain is non-stationary, each term can be maximised separately.
Stationarity
When the hidden Markov chain is assumed to be non-stationary, the complete data likelihood has a neat structure, in that \delta
only occurs in the first term, \Pi
only occurs in the second term, and the parameters associated with the observed probabilities only occur in the third term. Hence, the likelihood can easily be maximised by maximising each term individually. In this situation, the estimated parameters using BaumWelch
will be the “exact” maximum likelihood estimates.
When the hidden Markov chain is assumed to be stationary, \delta = \Pi^\prime \delta
(see topic compdelta
), and then the first two terms of the complete data likelihood determine the transition probabilities \Pi
. This raises more complicated numerical problems, as the first term is effectively a constraint. In our implementation of the EM algorithm, we deal with this in a slightly ad-hoc manner by effectively disregarding the first term, which is assumed to be relatively small. In the M-step, the transition matrix is determined by the second term, then \delta
is estimated using the relation \delta = \delta \Pi
. Hence, using the BaumWelch
function will only provide approximate maximum likelihood estimates. Exact solutions can be calculated by directly maximising the likelihood function, see first example in neglogLik
.
References
Cited references are listed on the HiddenMarkov manual page.
Examples
#----- Test Gaussian Distribution -----
Pi <- matrix(c(1/2, 1/2, 0,
1/3, 1/3, 1/3,
0, 1/2, 1/2),
byrow=TRUE, nrow=3)
delta <- c(0, 1, 0)
x <- dthmm(NULL, Pi, delta, "norm",
list(mean=c(1, 6, 3), sd=c(0.5, 1, 0.5)))
x <- simulate(x, nsim=1000)
# use above parameter values as initial values
y <- BaumWelch(x)
print(summary(y))
print(logLik(y))
hist(residuals(y))
# check parameter estimates
print(sum(y$delta))
print(y$Pi %*% rep(1, ncol(y$Pi)))
#----- Test Poisson Distribution -----
Pi <- matrix(c(0.8, 0.2,
0.3, 0.7),
byrow=TRUE, nrow=2)
delta <- c(0, 1)
x <- dthmm(NULL, Pi, delta, "pois", list(lambda=c(4, 0.1)),
discrete = TRUE)
x <- simulate(x, nsim=1000)
# use above parameter values as initial values
y <- BaumWelch(x)
print(summary(y))
print(logLik(y))
hist(residuals(y))
# check parameter estimates
print(sum(y$delta))
print(y$Pi %*% rep(1, ncol(y$Pi)))
#----- Test Exponential Distribution -----
Pi <- matrix(c(0.8, 0.2,
0.3, 0.7),
byrow=TRUE, nrow=2)
delta <- c(0, 1)
x <- dthmm(NULL, Pi, delta, "exp", list(rate=c(2, 0.1)))
x <- simulate(x, nsim=1000)
# use above parameter values as initial values
y <- BaumWelch(x)
print(summary(y))
print(logLik(y))
hist(residuals(y))
# check parameter estimates
print(sum(y$delta))
print(y$Pi %*% rep(1, ncol(y$Pi)))
#----- Test Beta Distribution -----
Pi <- matrix(c(0.8, 0.2,
0.3, 0.7),
byrow=TRUE, nrow=2)
delta <- c(0, 1)
x <- dthmm(NULL, Pi, delta, "beta", list(shape1=c(2, 6), shape2=c(6, 2)))
x <- simulate(x, nsim=1000)
# use above parameter values as initial values
y <- BaumWelch(x)
print(summary(y))
print(logLik(y))
hist(residuals(y))
# check parameter estimates
print(sum(y$delta))
print(y$Pi %*% rep(1, ncol(y$Pi)))
#----- Test Binomial Distribution -----
Pi <- matrix(c(0.8, 0.2,
0.3, 0.7),
byrow=TRUE, nrow=2)
delta <- c(0, 1)
# vector of "fixed & known" number of Bernoulli trials
pn <- list(size=rpois(1000, 10)+1)
x <- dthmm(NULL, Pi, delta, "binom", list(prob=c(0.2, 0.8)), pn,
discrete=TRUE)
x <- simulate(x, nsim=1000)
# use above parameter values as initial values
y <- BaumWelch(x)
print(summary(y))
print(logLik(y))
hist(residuals(y))
# check parameter estimates
print(sum(y$delta))
print(y$Pi %*% rep(1, ncol(y$Pi)))
#----- Test Gamma Distribution -----
Pi <- matrix(c(0.8, 0.2,
0.3, 0.7),
byrow=TRUE, nrow=2)
delta <- c(0, 1)
pm <- list(rate=c(4, 0.5), shape=c(3, 3))
x <- seq(0.01, 10, 0.01)
plot(x, dgamma(x, rate=pm$rate[1], shape=pm$shape[1]),
type="l", col="blue", ylab="Density")
points(x, dgamma(x, rate=pm$rate[2], shape=pm$shape[2]),
type="l", col="red")
x <- dthmm(NULL, Pi, delta, "gamma", pm)
x <- simulate(x, nsim=1000)
# use above parameter values as initial values
y <- BaumWelch(x)
print(summary(y))
print(logLik(y))
hist(residuals(y))
# check parameter estimates
print(sum(y$delta))
print(y$Pi %*% rep(1, ncol(y$Pi)))