dsmmR-package {dsmmR}R Documentation

dsmmR : Estimation and Simulation of Drifting Semi-Markov Models

Description

Performs parametric and non-parametric estimation and simulation of drifting semi-Markov processes. The definition of parametric and non-parametric model specifications is also possible. Furthermore, three different types of drifting semi-Markov models are considered. These models differ in the number of transition matrices and sojourn time distributions used for the computation of a number of semi-Markov kernels, which in turn characterize the drifting semi-Markov kernel.

Details

Introduction

The difference between the Markov models and the semi-Markov models concerns the modelling of the sojourn time distributions. The Markov models (in discrete time) are modelled by a sojourn time following the Geometric distribution. The semi-Markov models are able to have a sojourn time distribution of arbitrary shape. The further difference with a drifting semi-Markov model, is that we have d+1d + 1 (arbitrary) sojourn time distributions and d+1d + 1 transition matrices (Model 1), where dd is defined as the polynomial degree. Through them, we compute d+1d + 1 semi-Markov kernels. In this work, we also consider the possibility for obtaining these semi-Markov kernels with d+1d + 1 transition matrices and 11 sojourn time distribution (Model 2) or d+1d + 1 sojourn time distributions and 11 transition matrix (Model 3).

Definition

Drifting semi-Markov processes are particular non-homogeneous semi-Markov chains for which the drifting semi-Markov kernel qtn(u,v,l)q_{\frac{t}{n}}(u,v,l) is defined as the probability that, given at the instance tt the previous state is uu, the next state state vv will be reached with a sojourn time of ll:

qtn(u,v,l)=P(Jt=v,Xt=lJt1=u),q_{\frac{t}{n}}(u,v,l) = P(J_{t}=v,X_{t}=l|J_{t-1}=u),

where nn is the model size, defined as the length of the embedded Markov chain (Jt)t{0,,n}(J_{t})_{t\in \{0,\dots,n\}} minus the last state, where JtJ_{t} is the state at the instant tt and Xt=StSt1X_{t}=S_{t}-S_{t-1} is the sojourn time of the state Jt1J_{t-1}.

The drifting semi-Markov kernel qtnq_{\frac{t}{n}} is a linear combination of the product of d+1d + 1 semi-Markov kernels qidq_{\frac{i}{d}}, where every semi-Markov kernel is the product of a transition matrix pp and a sojourn time distribution ff. We define the situation when both pp and ff are "drifting" between d+1d + 1 fixed points of the model as Model 1, and thus we will use the exponential (1)(1) as a way to refer to the drifting semi-Markov kernel qtn (1)q_{\frac{t}{n}}^{\ (1)} and corresponding semi-Markov kernels qid (1)q_{\frac{i}{d}}^{\ (1)} in this case. For Model 2, we allow the transition matrix pp to drift but not the sojourn time distributions ff, and for Model 3 we allow the sojourn time distributions ff to drift but not the transition matrix pp. The exponential (2)(2) or (3)(3) will be used for signifying Model 2 or Model 3, respectively. In the general case an exponential will not be used.

Model 1

Both pp and ff are drifting in this case. Thus, the drifting semi-Markov kernel qtn (1)q_{\frac{t}{n}}^{\ (1)} is a linear combination of the product of d+1d + 1 semi-Markov kernels qid (1)q_{\frac{i}{d}}^{\ (1)}, which are given by:

qid (1)(u,v,l)=pid(u,v)fid(u,v,l),q_{\frac{i}{d}}^{\ (1)}(u,v,l)= {p_{\frac{i}{d}}(u,v)}{f_{\frac{i}{d}}(u,v,l)},

where for i=0,,di = 0,\dots,d we have d+1d + 1 Markov transition matrices pid(u,v)p_{\frac{i}{d}}(u,v) of the embedded Markov chain (Jt)t{0,,n}(J_{t})_{t\in \{0,\dots,n\}}, and d+1d + 1 sojourn time distributions fid(u,v,l)f_{\frac{i}{d}}(u,v,l). Therefore, the drifting semi-Markov kernel is described as:

qtn (1)(u,v,l)=i=0dAi(t) qid (1)(u,v,l)=i=0dAi(t) pid(u,v)fid(u,v,l),q_{\frac{t}{n}}^{\ (1)}(u,v,l) = \sum_{i = 0}^{d}A_{i}(t)\ q_{\frac{i}{d}}^{\ (1)}(u,v,l) = \sum_{i = 0}^{d}A_{i}(t)\ p_{\frac{i}{d}}(u,v)f_{\frac{i}{d}}(u,v,l),

where Ai,i=0,,dA_i, i = 0, \dots, d are d+1d + 1 polynomials with degree dd, which satisfy the conditions:

i=0dAi(t)=1,\sum_{i=0}^{d}A_{i}(t) = 1,

Ai(njd)=1{i=j},A_i \left(\frac{nj}{d} \right)= 1_{\{i=j\}},

where the indicator function 1{i=j}=11_{\{i=j\}} = 1, if i=ji = j, 00 otherwise.

Model 2

In this case, pp is drifting and ff is not drifting. Therefore, the drifting semi-Markov kernel is now described as:

qtn (2)(u,v,l)=i=0dAi(t) qid (2)(u,v,l)=i=0dAi(t) pid(u,v)f(u,v,l).q_{\frac{t}{n}}^{\ (2)}(u,v,l) = \sum_{i = 0}^{d}A_{i}(t)\ q_{\frac{i}{d}}^{\ (2)}(u,v,l) = \sum_{i = 0}^{d}A_{i}(t)\ p_{\frac{i}{d}}(u,v)f(u,v,l).

Model 3

In this case, ff is drifting and pp is not drifting. Therefore, the drifting semi-Markov Kernel is now described as:

qtn (3)(u,v,l)=i=0dAi(t) qid (3)(u,v,l)=i=0dAi(t) p(u,v)fid(u,v,l).q_{\frac{t}{n}}^{\ (3)}(u,v,l) = \sum_{i = 0}^{d}A_{i}(t)\ q_{\frac{i}{d}}^{\ (3)}(u,v,l) = \sum_{i = 0}^{d}A_{i}(t)\ p(u,v)f_{\frac{i}{d}}(u,v,l).

Parametric and non-parametric model specifications

In this package, we can define parametric and non-parametric drifting semi-Markov models.

For the parametric case, several discrete distributions are considered for the modelling of the sojourn times: Uniform, Geometric, Poisson, Discrete Weibull and Negative Binomial. This is done from the function parametric_dsmm which returns an object of the S3 class (dsmm_parametric, dsmm).

The non-parametric model specification concerns the sojourn time distributions when no assumptions are done about the shape of the distributions. This is done through the function called nonparametric_dsmm(), that returns an object of class (dsmm_nonparametric, dsmm).

It is also possible to proceed with a parametric or non-parametric estimation for a model on an existing sequence through the function fit_dsmm(), which returns an object with the S3 class (dsmm_fit_parametric, dsmm) or (dsmm_fit_nonparametric, dsmm) respectively, depending on the given argument estimation = "parametric" or estimation = "nonparametric" .

Therefore, the dsmm class acts like a wrapper class for drifting semi-Markov model specifications, while the classes dsmm_fit_parametric, dsmm_fit_nonparametric, dsmm_parametric and dsmm_nonparametric are exclusive to the functions that create the corresponding models, and inherit methods from the dsmm class.

In summary, based on an dsmm object it is possible to use the following methods:

Restrictions

The following restrictions must be satisfied for every drifting semi-Markov model:

Model specification restrictions

When we define a drifting semi-Markov model specification through the functions parametric_dsmm or nonparametric_dsmm, the following restrictions need to be satisfied.

Model 1

The semi-Markov kernels are equal to qid (1)(u,v,l)=pid(u,v)fid(u,v,l)q_{\frac{i}{d}}^{\ (1)}(u,v,l) = p_{\frac{i}{d}}(u,v)f_{\frac{i}{d}}(u,v,l). Therefore, uE\forall u \in E the sums of pid(u,v)p_{\frac{i}{d}}(u,v) over vv and the sums of fid(u,v,l)f_{\frac{i}{d}}(u,v,l) over ll must be equal to 1:

vEpid(u,v)=1,\sum_{v \in E} p_{\frac{i}{d}}(u,v) = 1,

l=1+fid(u,v,l)=1.\sum_{l = 1}^{+\infty }f_{\frac{i}{d}}(u,v,l) = 1.

Model 2

The semi-Markov kernels are equal to qid (2)(u,v,l)=pid(u,v)f(u,v,l)q_{\frac{i}{d}}^{\ (2)}(u,v,l) = p_{\frac{i}{d}}(u,v)f(u,v,l). Therefore, uE\forall u \in E the sums of pid(u,v)p_{\frac{i}{d}}(u,v) over vv and the sums of f(u,v,l)f(u,v,l) over ll must be equal to 1:

vEpid(u,v)=1,\sum_{v \in E} p_{\frac{i}{d}}(u,v) = 1,

l=1+f(u,v,l)=1.\sum_{l = 1}^{+\infty }f(u,v,l) = 1.

Model 3

The semi-Markov kernels are equal to qid (3)(u,v,l)=p(u,v)fid(u,v,l)q_{\frac{i}{d}}^{\ (3)}(u,v,l) = p(u,v)f_{\frac{i}{d}}(u,v,l). Therefore, uE\forall u \in E the sums of p(u,v)p(u,v) over vv and the sums of fid(u,v,l)f_{\frac{i}{d}}(u,v,l) over ll must be equal to 1:

vEp(u,v)=1,\sum_{v \in E}p(u,v) = 1,

l=1+fid(u,v,l)=1.\sum_{l = 1}^{+\infty }f_{\frac{i}{d}}(u,v,l) = 1.

Community Guidelines

For third parties wishing to contribute to the software, or to report issues or problems about the software, they can do so directly through the development github page of the package.

Notes

Automated tests are in place in order to aid the user with any false input made and, furthermore, to ensure that the functions used return the expected output. Moreover, through strict automated tests, it is made possible for the user to properly define their own dsmm objects and make use of them with the generic functions of the package.

Author(s)

Maintainer: Ioannis Mavrogiannis mavrogiannis.ioa@gmail.com

Authors:

References

Barbu, V. S., Limnios, N. (2008). Semi-Markov Chains and Hidden Semi-Markov Models Toward Applications - Their Use in Reliability and DNA Analysis. New York: Lecture Notes in Statistics, vol. 191, Springer.

Vergne, N. (2008). Drifting Markov models with Polynomial Drift and Applications to DNA Sequences. Statistical Applications in Genetics Molecular Biology 7 (1).

Barbu V. S., Vergne, N. (2019). Reliability and survival analysis for drifting Markov models: modelling and estimation. Methodology and Computing in Applied Probability, 21(4), 1407-1429.

T. Nakagawa and S. Osaki. (1975). The discrete Weibull distribution. IEEE Transactions on Reliability, R-24, 300-301.

Sanger, F., Coulson, A. R., Hong, G. F., Hill, D. F., & Petersen, G. B. (1982). Nucleotide sequence of bacteriophage λ\lambda DNA. Journal of molecular biology, 162(4), 729-773.

See Also

For the estimation of a drifting semi-Markov model given a sequence: fit_dsmm.

For drifting semi-Markov model specifications: parametric_dsmm, nonparametric_dsmm.

For the simulation of sequences: simulate.dsmm, create_sequence.

For the retrieval of the drifting semi-Markov kernel through a dsmm object: get_kernel.


[Package dsmmR version 1.0.5 Index]