fitdmm {drimmR} | R Documentation |
Point by point estimates of a k-th order drifting Markov Model
Description
Estimation of d+1 points of support transition matrices and initial law of a k-th
order drifting Markov Model starting from one or several sequences.
Usage
fitdmm(
sequences,
order,
degree,
states,
init.estim = c("mle", "freq", "prod", "stationary", "unif"),
fit.method = c("sum"),
ncpu = 2
)
Arguments
sequences |
A list of character vector(s) representing one (several) sequence(s) |
order |
Order of the Markov chain |
degree |
Degree of the polynomials (e.g., linear drifting if |
states |
Vector of states space of length s > 1 |
init.estim |
Default="mle". Method used to estimate the initial law.
If |
fit.method |
If |
ncpu |
Default=2. Represents the number of cores used to parallelized computation. If ncpu=-1, then it uses all available cores. |
Details
The fitdmm function creates a drifting Markov model object dmm
.
Let , s <
be random system with finite state space,
with a time evolution governed by discrete-time stochastic process of values in
.
A sequence
with state space
is said to be a
linear drifting Markov chain (of order 1) of length
between the Markov transition matrices
and
if the distribution of
,
, is defined by
, where
.
The linear drifting Markov model of order
can be generalized to polynomial drifting Markov model of
order
and degree
.Let
be
Markov transition matrices (of order
) over a state space
.
The estimation of DMMs is carried out for 4 different types of data :
- One can observe one sample path :
It is denoted by
, where m denotes the length of the sample path and
the length of the drifting Markov chain. Two cases can be considered:
m=n (a complete sample path),
m < n (an incomplete sample path).
- One can also observe
i.i.d. sample paths :
It is denoted by
. Two cases cases are considered :
-
(complete sample paths of drifting Markov chains of the same length),
-
(incomplete sample paths of drifting Markov chains of the same length). In this case, an usual LSE over the sample paths is used.
-
The initial distribution of a k-th order drifting Markov Model is defined as
. The initial distribution of the k first letters is freely
customisable by the user, but five methods are proposed for the estimation
of the latter :
- Estimation based on the Maximum Likelihood Estimator:
-
The Maximum Likelihood Estimator for the initial distribution. The formula is:
, where
is the number of occurences of the word
(of length
) at the beginning of each sequence and
is the number of sequences. This estimator is reliable when the number of sequences
is high.
- Estimation based on the frequency:
The initial distribution is estimated by taking the frequences of the words of length k for all sequences. The formula is
, where
is the number of occurences of the word
(of length
) in the sequences and
is the sum of the lengths of the sequences.
- Estimation based on the product of the frequences of each state:
-
The initial distribution is estimated by using the product of the frequences of each state (for all the sequences) in the word of length
.
- Estimation based on the stationary law of point of support transition matrix for a word of length k :
-
The initial distribution is estimated using
- Estimation based on the uniform law :
-
Value
An object of class dmm
Author(s)
Geoffray Brelurut, Alexandre Seiller
References
Barbu VS, Vergne N (2018). “Reliability and survival analysis for drifting Markov models: modelling and estimation.” Methodology and Computing in Applied Probability, 1–33. doi: 10.1007/s11009-018-9682-8, https://doi.org/10.1007/s11009-018-9682-8. Vergne N (2008). “Drifting Markov models with polynomial drift and applications to DNA sequences.” Statistical Applications in Genetics Molecular Biology , 7(1) . doi: 10.2202/1544-6115.1326, https://doi.org/10.2202/1544-6115.1326.
Examples
data(lambda, package = "drimmR")
states <- c("a","c","g","t")
order <- 1
degree <- 1
fitdmm(lambda,order,degree,states, init.estim = "freq",fit.method="sum")