fitdmm {drimmR} | R Documentation |
Point by point estimates of a k-th order drifting Markov Model
Description
Estimation of d+1 points of support transition matrices and |E|^{k}
initial law of a k-th
order drifting Markov Model starting from one or several sequences.
Usage
fitdmm(
sequences,
order,
degree,
states,
init.estim = c("mle", "freq", "prod", "stationary", "unif"),
fit.method = c("sum"),
ncpu = 2
)
Arguments
sequences |
A list of character vector(s) representing one (several) sequence(s) |
order |
Order of the Markov chain |
degree |
Degree of the polynomials (e.g., linear drifting if |
states |
Vector of states space of length s > 1 |
init.estim |
Default="mle". Method used to estimate the initial law.
If |
fit.method |
If |
ncpu |
Default=2. Represents the number of cores used to parallelized computation. If ncpu=-1, then it uses all available cores. |
Details
The fitdmm function creates a drifting Markov model object dmm
.
Let E={1,\ldots, s}
, s < \infty
be random system with finite state space,
with a time evolution governed by discrete-time stochastic process of values in E
.
A sequence X_0, X_1, \ldots, X_n
with state space E= {1, 2, \ldots, s}
is said to be a
linear drifting Markov chain (of order 1) of length n
between the Markov transition matrices
\Pi_0
and \Pi_1
if the distribution of X_t
, t = 1, \ldots, n
, is defined by
P(X_t=v \mid X_{t-1} = u, X_{t-2}, \ldots ) = \Pi_{\frac{t}{n}}(u, v), ; u, v \in E
, where
\Pi_{\frac{t}{n}}(u, v) = ( 1 - \frac{t}{n}) \Pi_0(u, v) + \frac{t}{n} \Pi_1(u, v), \; u, v \in E
.
The linear drifting Markov model of order 1
can be generalized to polynomial drifting Markov model of
order k
and degree d
.Let \Pi_{\frac{i}{d}} = (\Pi_{\frac{i}{d}}(u_1, \dots, u_k, v))_{u_1, \dots, u_k,v \in E}
be d
Markov transition matrices (of order k
) over a state space E
.
The estimation of DMMs is carried out for 4 different types of data :
- One can observe one sample path :
It is denoted by
H(m,n):= (X_0,X_1, \ldots,X_{m})
, where m denotes the length of the sample path andn
the length of the drifting Markov chain. Two cases can be considered:m=n (a complete sample path),
m < n (an incomplete sample path).
- One can also observe
H
i.i.d. sample paths : It is denoted by
H_i(m_i,n_i), i=1, \ldots, H
. Two cases cases are considered :-
m_i=n_i=n \forall i=1, \ldots, H
(complete sample paths of drifting Markov chains of the same length), -
n_i=n \forall i=1, \ldots, H
(incomplete sample paths of drifting Markov chains of the same length). In this case, an usual LSE over the sample paths is used.
-
The initial distribution of a k-th order drifting Markov Model is defined as
\mu_i = P(X_1 = i)
. The initial distribution of the k first letters is freely
customisable by the user, but five methods are proposed for the estimation
of the latter :
- Estimation based on the Maximum Likelihood Estimator:
-
The Maximum Likelihood Estimator for the initial distribution. The formula is:
\widehat{\mu_i} = \frac{Nstart_i}{L}
, whereNstart_i
is the number of occurences of the wordi
(of lengthk
) at the beginning of each sequence andL
is the number of sequences. This estimator is reliable when the number of sequencesL
is high. - Estimation based on the frequency:
The initial distribution is estimated by taking the frequences of the words of length k for all sequences. The formula is
\widehat{\mu_i} = \frac{N_i}{N}
, whereN_i
is the number of occurences of the wordi
(of lengthk
) in the sequences andN
is the sum of the lengths of the sequences.- Estimation based on the product of the frequences of each state:
-
The initial distribution is estimated by using the product of the frequences of each state (for all the sequences) in the word of length
k
. - Estimation based on the stationary law of point of support transition matrix for a word of length k :
-
The initial distribution is estimated using
\mu(\Pi_{\frac{k-1}{n}})
- Estimation based on the uniform law :
-
\frac{1}{s}
Value
An object of class dmm
Author(s)
Geoffray Brelurut, Alexandre Seiller
References
Barbu VS, Vergne N (2018). “Reliability and survival analysis for drifting Markov models: modelling and estimation.” Methodology and Computing in Applied Probability, 1–33. doi: 10.1007/s11009-018-9682-8, https://doi.org/10.1007/s11009-018-9682-8. Vergne N (2008). “Drifting Markov models with polynomial drift and applications to DNA sequences.” Statistical Applications in Genetics Molecular Biology , 7(1) . doi: 10.2202/1544-6115.1326, https://doi.org/10.2202/1544-6115.1326.
Examples
data(lambda, package = "drimmR")
states <- c("a","c","g","t")
order <- 1
degree <- 1
fitdmm(lambda,order,degree,states, init.estim = "freq",fit.method="sum")