R: Maximum Likelihood Estimation (MLE) of a k-th order Markov...

fitmm {smmR}

R Documentation

Maximum Likelihood Estimation (MLE) of a k-th order Markov chain

Description

Maximum Likelihood Estimation of the transition matrix and initial distribution of a k-th order Markov chain starting from one or several sequences.

Usage

fitmm(sequences, states, k = 1, init.estim = "mle")

Arguments

`sequences`	A list of vectors representing the sequences.
`states`	Vector of state space (of length s).
`k`	Order of the Markov chain.
`init.estim`	Optional. `init.estim` gives the method used to estimate the initial distribution. The following methods are proposed: `init.estim = "mle"`: the classical Maximum Likelihood Estimator is used to estimate the initial distribution `init`; `init.estim = "stationary"`: the initial distribution is replaced by the stationary distribution of the Markov chain (if the order of the Markov chain is more than one, the transition matrix is converted into a square block matrix in order to estimate the stationary distribution); `init.estim = "freq"`: the initial distribution is estimated by taking the frequencies of the words of length `k` for all sequences; `init.estim = "prod"`: `init` is estimated by using the product of the frequencies of each letter (for all the sequences) in the word of length `k`; `init.estim = "unif"`: the initial probability of each state is equal to `1 / s`, with `s` the number of states.

Details

Let X_1, X_2, ..., X_n be a trajectory of length n of the Markov chain X = (X_m)_{m \in N} of order k = 1 with transition matrix p_{trans}(i,j) = P(X_{m+1} = j | X_m = i). The maximum likelihood estimation of the transition matrix is \widehat{p_{trans}}(i,j) = \frac{N_{ij}}{N_{i.}}, where N_{ij} is the number of transitions from state i to state j and N_{i.} is the number of transition from state i to any state. For k > 1 we have similar expressions.

The initial distribution of a k-th order Markov chain is defined as \mu_i = P(X_1 = i). Five methods are proposed for the estimation of the latter :

Maximum Likelihood Estimator:: The Maximum Likelihood Estimator for the initial distribution. The formula is: \widehat{\mu_i} = \frac{Nstart_i}{L}, where Nstart_i is the number of occurences of the word i (of length k) at the beginning of each sequence and L is the number of sequences. This estimator is reliable when the number of sequences L is high.
Stationary distribution:: The stationary distribution is used as a surrogate of the initial distribution. If the order of the Markov chain is more than one, the transition matrix is converted into a square block matrix in order to estimate the stationary distribution. This method may take time if the order of the Markov chain is high (more than three (3)).
Frequencies of each word:: The initial distribution is estimated by taking the frequencies of the words of length k for all sequences. The formula is \widehat{\mu_i} = \frac{N_i}{N}, where N_i is the number of occurences of the word i (of length k) in the sequences and N is the sum of the lengths of the sequences.
Product of the frequencies of each state:: The initial distribution is estimated by using the product of the frequencies of each state (for all the sequences) in the word of length k.
Uniform distribution:: The initial probability of each state is equal to 1 / s, with s, the number of states.

Value

An object of class S3 mmfit (inheriting from the S3 class mm). The S3 class mmfit contains:

All the attributes of the S3 class mm;
An attribute M which is an integer giving the total length of the set of sequences sequences (sum of all the lengths of the list sequences);
An attribute loglik which is a numeric value giving the value of the log-likelihood of the specified Markov model based on the sequences;
An attribute sequences which is equal to the parameter sequences of the function fitmm (i.e. a list of sequences used to estimate the Markov model).

Examples

states <- c("a", "c", "g", "t")
s <- length(states)
k <- 2
init <- rep.int(1 / s ^ k, s ^ k)
p <- matrix(0.25, nrow = s ^ k, ncol = s)

# Specify a Markov model of order 2
markov <- mm(states = states, init = init, ptrans = p, k = k)

seqs <- simulate(object = markov, nsim = c(1000, 10000, 2000), seed = 150)

est <- fitmm(sequences = seqs, states = states, k = 2)

[Package smmR version 1.0.3 Index]