dsmmR-package {dsmmR} | R Documentation |
dsmmR : Estimation and Simulation of Drifting Semi-Markov Models
Description
Performs parametric and non-parametric estimation and simulation of drifting semi-Markov processes. The definition of parametric and non-parametric model specifications is also possible. Furthermore, three different types of drifting semi-Markov models are considered. These models differ in the number of transition matrices and sojourn time distributions used for the computation of a number of semi-Markov kernels, which in turn characterize the drifting semi-Markov kernel.
Details
Introduction
The difference between the Markov models and the semi-Markov models
concerns the modelling of the sojourn time distributions.
The Markov models (in discrete time) are modelled by a sojourn time
following the Geometric distribution. The semi-Markov models
are able to have a sojourn time distribution of arbitrary shape.
The further difference with a drifting semi-Markov model,
is that we have d + 1
(arbitrary) sojourn time distributions
and d + 1
transition matrices (Model 1),
where d
is defined as the polynomial degree.
Through them, we compute d + 1
semi-Markov kernels.
In this work, we also consider the possibility for obtaining these
semi-Markov kernels with d + 1
transition matrices and 1
sojourn time distribution (Model 2) or d + 1
sojourn time
distributions and 1
transition matrix (Model 3).
Definition
Drifting semi-Markov processes are particular non-homogeneous semi-Markov
chains for which the drifting semi-Markov kernel
q_{\frac{t}{n}}(u,v,l)
is defined as
the probability that, given at the instance t
the previous state is u
, the next state state v
will be
reached with a sojourn time of l
:
q_{\frac{t}{n}}(u,v,l) = P(J_{t}=v,X_{t}=l|J_{t-1}=u),
where n
is the model size, defined as the length of the
embedded Markov chain (J_{t})_{t\in \{0,\dots,n\}}
minus the
last state, where J_{t}
is the state at the instant t
and
X_{t}=S_{t}-S_{t-1}
is the sojourn time of the state J_{t-1}
.
The drifting semi-Markov kernel q_{\frac{t}{n}}
is a linear combination of the product of d + 1
semi-Markov kernels
q_{\frac{i}{d}}
, where every semi-Markov kernel is the product of
a transition matrix p
and a sojourn time distribution
f
. We define the situation when both p
and
f
are "drifting" between d + 1
fixed points of the model
as Model 1, and thus we will use the exponential (1)
as a way to
refer to the drifting semi-Markov kernel
q_{\frac{t}{n}}^{\ (1)}
and corresponding
semi-Markov kernels q_{\frac{i}{d}}^{\ (1)}
in this case.
For Model 2, we allow the transition matrix p
to drift
but not the sojourn time distributions f
, and for Model 3 we allow
the sojourn time distributions f
to drift but not the transition
matrix p
.
The exponential (2)
or (3)
will be used for signifying
Model 2 or Model 3, respectively.
In the general case an exponential will not be used.
Model 1
Both p
and f
are drifting in this case.
Thus, the drifting semi-Markov kernel q_{\frac{t}{n}}^{\ (1)}
is a
linear combination of the product of d + 1
semi-Markov kernels
q_{\frac{i}{d}}^{\ (1)}
, which are given by:
q_{\frac{i}{d}}^{\ (1)}(u,v,l)=
{p_{\frac{i}{d}}(u,v)}{f_{\frac{i}{d}}(u,v,l)},
where for i = 0,\dots,d
we have d + 1
Markov transition matrices
p_{\frac{i}{d}}(u,v)
of the embedded Markov chain (J_{t})_{t\in \{0,\dots,n\}}
,
and d + 1
sojourn time distributions
f_{\frac{i}{d}}(u,v,l)
. Therefore, the drifting semi-Markov kernel
is described as:
q_{\frac{t}{n}}^{\ (1)}(u,v,l)
= \sum_{i = 0}^{d}A_{i}(t)\ q_{\frac{i}{d}}^{\ (1)}(u,v,l)
= \sum_{i = 0}^{d}A_{i}(t)\ p_{\frac{i}{d}}(u,v)f_{\frac{i}{d}}(u,v,l),
where A_i, i = 0, \dots, d
are d + 1
polynomials with degree
d
, which satisfy the conditions:
\sum_{i=0}^{d}A_{i}(t) = 1,
A_i \left(\frac{nj}{d} \right)= 1_{\{i=j\}},
where the indicator function 1_{\{i=j\}} = 1
,
if i = j
, 0
otherwise.
Model 2
In this case, p
is drifting and f
is not drifting.
Therefore, the drifting semi-Markov kernel is now described as:
q_{\frac{t}{n}}^{\ (2)}(u,v,l)
= \sum_{i = 0}^{d}A_{i}(t)\ q_{\frac{i}{d}}^{\ (2)}(u,v,l)
= \sum_{i = 0}^{d}A_{i}(t)\ p_{\frac{i}{d}}(u,v)f(u,v,l).
Model 3
In this case, f
is drifting and p
is not drifting.
Therefore, the drifting semi-Markov Kernel is now described as:
q_{\frac{t}{n}}^{\ (3)}(u,v,l)
= \sum_{i = 0}^{d}A_{i}(t)\ q_{\frac{i}{d}}^{\ (3)}(u,v,l)
= \sum_{i = 0}^{d}A_{i}(t)\ p(u,v)f_{\frac{i}{d}}(u,v,l).
Parametric and non-parametric model specifications
In this package, we can define parametric and non-parametric drifting semi-Markov models.
For the parametric case, several discrete distributions are
considered for the modelling of the sojourn times:
Uniform, Geometric, Poisson, Discrete Weibull and Negative Binomial.
This is done from the function
parametric_dsmm
which returns an object of the
S3 class (dsmm_parametric
, dsmm
).
The non-parametric model specification concerns the sojourn
time distributions when no assumptions are done about the
shape of the distributions. This is done through the function called
nonparametric_dsmm()
, that returns an object of class
(dsmm_nonparametric
, dsmm
).
It is also possible to proceed with a parametric or non-parametric
estimation for a model on an existing sequence through the function
fit_dsmm()
, which returns an object with the S3 class
(dsmm_fit_parametric
, dsmm
) or
(dsmm_fit_nonparametric
, dsmm
) respectively, depending
on the given argument estimation = "parametric"
or
estimation = "nonparametric"
.
Therefore, the dsmm
class acts like a wrapper class
for drifting semi-Markov model specifications, while the classes
dsmm_fit_parametric
, dsmm_fit_nonparametric
,
dsmm_parametric
and dsmm_nonparametric
are exclusive to the functions that create the corresponding models,
and inherit methods from the dsmm
class.
In summary, based on an dsmm
object
it is possible to use the following methods:
Simulate a sequence through the function
simulate.dsmm()
.Get the drifting semi-Markov kernel
q_{\frac{t}{n}}(u,v,l)
, for any choice ofu,v,l
ort
, through the functionget_kernel()
.
Restrictions
The following restrictions must be satisfied for every drifting semi-Markov model:
The drifting semi-Markov kernel
q_{\frac{t}{n}}(u,v,l)
, for everyt \in \{ 0, \dots, n \}
andu \in E
, has its sums overv
andl
, equal to1
:\sum_{v \in E}\sum_{l = 1}^{+\infty}q_{\frac{t}{n}}(u,v,l) = \sum_{v \in E}\sum_{l = 1}^{+\infty}A_{i}(t)\ q_{\frac{i}{d}}(u,v,l) = 1.
Therefore, we also get that for every
i \in \{0, \dots, d\}
andu \in E
, the semi-Markov kernelq_{\frac{i}{d}}(u,v,l)
has its sums overv
andl
equal to1
:\sum_{v \in E}\sum_{l = 1}^{+\infty}q_{\frac{i}{d}}(u,v,l) = 1.
Lastly, like in semi-Markov models, we do not allow sojourn times equal to
0
or passing into the same state:q_{\frac{t}{n}}(u,v,0) = 0, \forall u,v \in E,
q_{\frac{t}{n}}(u,u,l) = 0, \forall u\in E,l\in\{1,\dots,+\infty\}.
Model specification restrictions
When we define a drifting semi-Markov model specification through the
functions parametric_dsmm
or nonparametric_dsmm
,
the following restrictions need to be satisfied.
Model 1
The semi-Markov kernels are equal to
q_{\frac{i}{d}}^{\ (1)}(u,v,l) =
p_{\frac{i}{d}}(u,v)f_{\frac{i}{d}}(u,v,l)
. Therefore,
\forall u \in E
the sums
of p_{\frac{i}{d}}(u,v)
over v
and the sums of
f_{\frac{i}{d}}(u,v,l)
over l
must be
equal to 1:
\sum_{v \in E} p_{\frac{i}{d}}(u,v) = 1,
\sum_{l = 1}^{+\infty }f_{\frac{i}{d}}(u,v,l) = 1.
Model 2
The semi-Markov kernels are equal to q_{\frac{i}{d}}^{\ (2)}(u,v,l) =
p_{\frac{i}{d}}(u,v)f(u,v,l)
. Therefore, \forall u \in E
the sums of p_{\frac{i}{d}}(u,v)
over v
and
the sums of f(u,v,l)
over l
must be equal to 1:
\sum_{v \in E} p_{\frac{i}{d}}(u,v) = 1,
\sum_{l = 1}^{+\infty }f(u,v,l) = 1.
Model 3
The semi-Markov kernels are equal to q_{\frac{i}{d}}^{\ (3)}(u,v,l) =
p(u,v)f_{\frac{i}{d}}(u,v,l)
. Therefore,
\forall u \in E
the sums
of p(u,v)
over v
and the sums of
f_{\frac{i}{d}}(u,v,l)
over l
must be
equal to 1:
\sum_{v \in E}p(u,v) = 1,
\sum_{l = 1}^{+\infty }f_{\frac{i}{d}}(u,v,l) = 1.
Community Guidelines
For third parties wishing to contribute to the software, or to report issues or problems about the software, they can do so directly through the development github page of the package.
Notes
Automated tests are in place in order to aid the user with any false input made
and, furthermore, to ensure that the functions used return the expected output.
Moreover, through strict automated tests, it is made possible for the user to
properly define their own dsmm
objects and make use of them with the generic
functions of the package.
Author(s)
Maintainer: Ioannis Mavrogiannis mavrogiannis.ioa@gmail.com
Authors:
Vlad Stefan Barbu
Ioannis Mavrogiannis
Nicolas Vergne
References
Barbu, V. S., Limnios, N. (2008). Semi-Markov Chains and Hidden Semi-Markov Models Toward Applications - Their Use in Reliability and DNA Analysis. New York: Lecture Notes in Statistics, vol. 191, Springer.
Vergne, N. (2008). Drifting Markov models with Polynomial Drift and Applications to DNA Sequences. Statistical Applications in Genetics Molecular Biology 7 (1).
Barbu V. S., Vergne, N. (2019). Reliability and survival analysis for drifting Markov models: modelling and estimation. Methodology and Computing in Applied Probability, 21(4), 1407-1429.
T. Nakagawa and S. Osaki. (1975). The discrete Weibull distribution. IEEE Transactions on Reliability, R-24, 300-301.
Sanger, F., Coulson, A. R., Hong, G. F., Hill, D. F., & Petersen, G. B.
(1982). Nucleotide sequence of bacteriophage \lambda
DNA.
Journal of molecular biology, 162(4), 729-773.
See Also
For the estimation of a drifting semi-Markov model given a sequence: fit_dsmm.
For drifting semi-Markov model specifications: parametric_dsmm, nonparametric_dsmm.
For the simulation of sequences: simulate.dsmm, create_sequence.
For the retrieval of the drifting semi-Markov kernel through a
dsmm
object: get_kernel.