fitmm {smmR} | R Documentation |
Maximum Likelihood Estimation (MLE) of a k-th order Markov chain
Description
Maximum Likelihood Estimation of the transition matrix and initial distribution of a k-th order Markov chain starting from one or several sequences.
Usage
fitmm(sequences, states, k = 1, init.estim = "mle")
Arguments
sequences |
A list of vectors representing the sequences. |
states |
Vector of state space (of length s). |
k |
Order of the Markov chain. |
init.estim |
Optional.
|
Details
Let be a trajectory of length
of
the Markov chain
of order
with
transition matrix
. The
maximum likelihood estimation of the transition matrix is
, where
is the number of transitions from state
to state
and
is the number of transition from state
to any state.
For
we have similar expressions.
The initial distribution of a k-th order Markov chain is defined as
. Five methods are proposed for the estimation
of the latter :
- Maximum Likelihood Estimator:
The Maximum Likelihood Estimator for the initial distribution. The formula is:
, where
is the number of occurences of the word
(of length
) at the beginning of each sequence and
is the number of sequences. This estimator is reliable when the number of sequences
is high.
- Stationary distribution:
The stationary distribution is used as a surrogate of the initial distribution. If the order of the Markov chain is more than one, the transition matrix is converted into a square block matrix in order to estimate the stationary distribution. This method may take time if the order of the Markov chain is high (more than three (3)).
- Frequencies of each word:
The initial distribution is estimated by taking the frequencies of the words of length
k
for all sequences. The formula is, where
is the number of occurences of the word
(of length
) in the sequences and
is the sum of the lengths of the sequences.
- Product of the frequencies of each state:
The initial distribution is estimated by using the product of the frequencies of each state (for all the sequences) in the word of length
.
- Uniform distribution:
The initial probability of each state is equal to
, with
, the number of states.
Value
An object of class S3 mmfit
(inheriting from the S3 class mm).
The S3 class mmfit
contains:
All the attributes of the S3 class mm;
An attribute
M
which is an integer giving the total length of the set of sequencessequences
(sum of all the lengths of the listsequences
);An attribute
loglik
which is a numeric value giving the value of the log-likelihood of the specified Markov model based on thesequences
;An attribute
sequences
which is equal to the parametersequences
of the functionfitmm
(i.e. a list of sequences used to estimate the Markov model).
See Also
Examples
states <- c("a", "c", "g", "t")
s <- length(states)
k <- 2
init <- rep.int(1 / s ^ k, s ^ k)
p <- matrix(0.25, nrow = s ^ k, ncol = s)
# Specify a Markov model of order 2
markov <- mm(states = states, init = init, ptrans = p, k = k)
seqs <- simulate(object = markov, nsim = c(1000, 10000, 2000), seed = 150)
est <- fitmm(sequences = seqs, states = states, k = 2)