| fitmm {smmR} | R Documentation |
Maximum Likelihood Estimation (MLE) of a k-th order Markov chain
Description
Maximum Likelihood Estimation of the transition matrix and initial distribution of a k-th order Markov chain starting from one or several sequences.
Usage
fitmm(sequences, states, k = 1, init.estim = "mle")
Arguments
sequences |
A list of vectors representing the sequences. |
states |
Vector of state space (of length s). |
k |
Order of the Markov chain. |
init.estim |
Optional.
|
Details
Let X_1, X_2, ..., X_n be a trajectory of length n of
the Markov chain X = (X_m)_{m \in N} of order k = 1 with
transition matrix p_{trans}(i,j) = P(X_{m+1} = j | X_m = i). The
maximum likelihood estimation of the transition matrix is
\widehat{p_{trans}}(i,j) = \frac{N_{ij}}{N_{i.}}, where N_{ij}
is the number of transitions from state i to state j and
N_{i.} is the number of transition from state i to any state.
For k > 1 we have similar expressions.
The initial distribution of a k-th order Markov chain is defined as
\mu_i = P(X_1 = i). Five methods are proposed for the estimation
of the latter :
- Maximum Likelihood Estimator:
The Maximum Likelihood Estimator for the initial distribution. The formula is:
\widehat{\mu_i} = \frac{Nstart_i}{L}, whereNstart_iis the number of occurences of the wordi(of lengthk) at the beginning of each sequence andLis the number of sequences. This estimator is reliable when the number of sequencesLis high.- Stationary distribution:
The stationary distribution is used as a surrogate of the initial distribution. If the order of the Markov chain is more than one, the transition matrix is converted into a square block matrix in order to estimate the stationary distribution. This method may take time if the order of the Markov chain is high (more than three (3)).
- Frequencies of each word:
The initial distribution is estimated by taking the frequencies of the words of length
kfor all sequences. The formula is\widehat{\mu_i} = \frac{N_i}{N}, whereN_iis the number of occurences of the wordi(of lengthk) in the sequences andNis the sum of the lengths of the sequences.- Product of the frequencies of each state:
The initial distribution is estimated by using the product of the frequencies of each state (for all the sequences) in the word of length
k.- Uniform distribution:
The initial probability of each state is equal to
1 / s, withs, the number of states.
Value
An object of class S3 mmfit (inheriting from the S3 class mm).
The S3 class mmfit contains:
All the attributes of the S3 class mm;
An attribute
Mwhich is an integer giving the total length of the set of sequencessequences(sum of all the lengths of the listsequences);An attribute
loglikwhich is a numeric value giving the value of the log-likelihood of the specified Markov model based on thesequences;An attribute
sequenceswhich is equal to the parametersequencesof the functionfitmm(i.e. a list of sequences used to estimate the Markov model).
See Also
Examples
states <- c("a", "c", "g", "t")
s <- length(states)
k <- 2
init <- rep.int(1 / s ^ k, s ^ k)
p <- matrix(0.25, nrow = s ^ k, ncol = s)
# Specify a Markov model of order 2
markov <- mm(states = states, init = init, ptrans = p, k = k)
seqs <- simulate(object = markov, nsim = c(1000, 10000, 2000), seed = 150)
est <- fitmm(sequences = seqs, states = states, k = 2)