| fitdmm {drimmR} | R Documentation |
Point by point estimates of a k-th order drifting Markov Model
Description
Estimation of d+1 points of support transition matrices and |E|^{k} initial law of a k-th
order drifting Markov Model starting from one or several sequences.
Usage
fitdmm(
sequences,
order,
degree,
states,
init.estim = c("mle", "freq", "prod", "stationary", "unif"),
fit.method = c("sum"),
ncpu = 2
)
Arguments
sequences |
A list of character vector(s) representing one (several) sequence(s) |
order |
Order of the Markov chain |
degree |
Degree of the polynomials (e.g., linear drifting if |
states |
Vector of states space of length s > 1 |
init.estim |
Default="mle". Method used to estimate the initial law.
If |
fit.method |
If |
ncpu |
Default=2. Represents the number of cores used to parallelized computation. If ncpu=-1, then it uses all available cores. |
Details
The fitdmm function creates a drifting Markov model object dmm.
Let E={1,\ldots, s}, s < \infty be random system with finite state space,
with a time evolution governed by discrete-time stochastic process of values in E.
A sequence X_0, X_1, \ldots, X_n with state space E= {1, 2, \ldots, s} is said to be a
linear drifting Markov chain (of order 1) of length n between the Markov transition matrices
\Pi_0 and \Pi_1 if the distribution of X_t, t = 1, \ldots, n, is defined by
P(X_t=v \mid X_{t-1} = u, X_{t-2}, \ldots ) = \Pi_{\frac{t}{n}}(u, v), ; u, v \in E, where
\Pi_{\frac{t}{n}}(u, v) = ( 1 - \frac{t}{n}) \Pi_0(u, v) + \frac{t}{n} \Pi_1(u, v), \; u, v \in E.
The linear drifting Markov model of order 1 can be generalized to polynomial drifting Markov model of
order k and degree d.Let \Pi_{\frac{i}{d}} = (\Pi_{\frac{i}{d}}(u_1, \dots, u_k, v))_{u_1, \dots, u_k,v \in E}
be d Markov transition matrices (of order k) over a state space E.
The estimation of DMMs is carried out for 4 different types of data :
- One can observe one sample path :
It is denoted by
H(m,n):= (X_0,X_1, \ldots,X_{m}), where m denotes the length of the sample path andnthe length of the drifting Markov chain. Two cases can be considered:m=n (a complete sample path),
m < n (an incomplete sample path).
- One can also observe
Hi.i.d. sample paths : It is denoted by
H_i(m_i,n_i), i=1, \ldots, H. Two cases cases are considered :-
m_i=n_i=n \forall i=1, \ldots, H(complete sample paths of drifting Markov chains of the same length), -
n_i=n \forall i=1, \ldots, H(incomplete sample paths of drifting Markov chains of the same length). In this case, an usual LSE over the sample paths is used.
-
The initial distribution of a k-th order drifting Markov Model is defined as
\mu_i = P(X_1 = i). The initial distribution of the k first letters is freely
customisable by the user, but five methods are proposed for the estimation
of the latter :
- Estimation based on the Maximum Likelihood Estimator:
-
The Maximum Likelihood Estimator for the initial distribution. The formula is:
\widehat{\mu_i} = \frac{Nstart_i}{L}, whereNstart_iis the number of occurences of the wordi(of lengthk) at the beginning of each sequence andLis the number of sequences. This estimator is reliable when the number of sequencesLis high. - Estimation based on the frequency:
The initial distribution is estimated by taking the frequences of the words of length k for all sequences. The formula is
\widehat{\mu_i} = \frac{N_i}{N}, whereN_iis the number of occurences of the wordi(of lengthk) in the sequences andNis the sum of the lengths of the sequences.- Estimation based on the product of the frequences of each state:
-
The initial distribution is estimated by using the product of the frequences of each state (for all the sequences) in the word of length
k. - Estimation based on the stationary law of point of support transition matrix for a word of length k :
-
The initial distribution is estimated using
\mu(\Pi_{\frac{k-1}{n}}) - Estimation based on the uniform law :
-
\frac{1}{s}
Value
An object of class dmm
Author(s)
Geoffray Brelurut, Alexandre Seiller
References
Barbu VS, Vergne N (2018). “Reliability and survival analysis for drifting Markov models: modelling and estimation.” Methodology and Computing in Applied Probability, 1–33. doi: 10.1007/s11009-018-9682-8, https://doi.org/10.1007/s11009-018-9682-8. Vergne N (2008). “Drifting Markov models with polynomial drift and applications to DNA sequences.” Statistical Applications in Genetics Molecular Biology , 7(1) . doi: 10.2202/1544-6115.1326, https://doi.org/10.2202/1544-6115.1326.
Examples
data(lambda, package = "drimmR")
states <- c("a","c","g","t")
order <- 1
degree <- 1
fitdmm(lambda,order,degree,states, init.estim = "freq",fit.method="sum")