fitmm {smmR}R Documentation

Maximum Likelihood Estimation (MLE) of a k-th order Markov chain

Description

Maximum Likelihood Estimation of the transition matrix and initial distribution of a k-th order Markov chain starting from one or several sequences.

Usage

fitmm(sequences, states, k = 1, init.estim = "mle")

Arguments

sequences

A list of vectors representing the sequences.

states

Vector of state space (of length s).

k

Order of the Markov chain.

init.estim

Optional. init.estim gives the method used to estimate the initial distribution. The following methods are proposed:

  • init.estim = "mle": the classical Maximum Likelihood Estimator is used to estimate the initial distribution init;

  • init.estim = "stationary": the initial distribution is replaced by the stationary distribution of the Markov chain (if the order of the Markov chain is more than one, the transition matrix is converted into a square block matrix in order to estimate the stationary distribution);

  • init.estim = "freq": the initial distribution is estimated by taking the frequencies of the words of length k for all sequences;

  • init.estim = "prod": init is estimated by using the product of the frequencies of each letter (for all the sequences) in the word of length k;

  • init.estim = "unif": the initial probability of each state is equal to 1/s1 / s, with ss the number of states.

Details

Let X1,X2,...,XnX_1, X_2, ..., X_n be a trajectory of length nn of the Markov chain X=(Xm)mNX = (X_m)_{m \in N} of order k=1k = 1 with transition matrix ptrans(i,j)=P(Xm+1=jXm=i)p_{trans}(i,j) = P(X_{m+1} = j | X_m = i). The maximum likelihood estimation of the transition matrix is ptrans^(i,j)=NijNi.\widehat{p_{trans}}(i,j) = \frac{N_{ij}}{N_{i.}}, where NijN_{ij} is the number of transitions from state ii to state jj and Ni.N_{i.} is the number of transition from state ii to any state. For k>1k > 1 we have similar expressions.

The initial distribution of a k-th order Markov chain is defined as μi=P(X1=i)\mu_i = P(X_1 = i). Five methods are proposed for the estimation of the latter :

Maximum Likelihood Estimator:

The Maximum Likelihood Estimator for the initial distribution. The formula is: μi^=NstartiL\widehat{\mu_i} = \frac{Nstart_i}{L}, where NstartiNstart_i is the number of occurences of the word ii (of length kk) at the beginning of each sequence and LL is the number of sequences. This estimator is reliable when the number of sequences LL is high.

Stationary distribution:

The stationary distribution is used as a surrogate of the initial distribution. If the order of the Markov chain is more than one, the transition matrix is converted into a square block matrix in order to estimate the stationary distribution. This method may take time if the order of the Markov chain is high (more than three (3)).

Frequencies of each word:

The initial distribution is estimated by taking the frequencies of the words of length k for all sequences. The formula is μi^=NiN\widehat{\mu_i} = \frac{N_i}{N}, where NiN_i is the number of occurences of the word ii (of length kk) in the sequences and NN is the sum of the lengths of the sequences.

Product of the frequencies of each state:

The initial distribution is estimated by using the product of the frequencies of each state (for all the sequences) in the word of length kk.

Uniform distribution:

The initial probability of each state is equal to 1/s1 / s, with ss, the number of states.

Value

An object of class S3 mmfit (inheriting from the S3 class mm). The S3 class mmfit contains:

See Also

mm, simulate.mm

Examples

states <- c("a", "c", "g", "t")
s <- length(states)
k <- 2
init <- rep.int(1 / s ^ k, s ^ k)
p <- matrix(0.25, nrow = s ^ k, ncol = s)

# Specify a Markov model of order 2
markov <- mm(states = states, init = init, ptrans = p, k = k)

seqs <- simulate(object = markov, nsim = c(1000, 10000, 2000), seed = 150)

est <- fitmm(sequences = seqs, states = states, k = 2)


[Package smmR version 1.0.3 Index]