R: Parametric Drifting semi-Markov model specification

parametric_dsmm {dsmmR}

R Documentation

Parametric Drifting semi-Markov model specification

Description

Creates a parametric model specification for a drifting semi-Markov model. Returns an object of class (dsmm_parametric, dsmm).

Usage

parametric_dsmm(
  model_size,
  states,
  initial_dist,
  degree,
  f_is_drifting,
  p_is_drifting,
  p_dist,
  f_dist,
  f_dist_pars
)

Arguments

`model_size`	Positive integer that represents the size of the drifting semi-Markov model `n`. It is equal to the length of a theoretical embedded Markov chain `(J_{t})_{t\in \{0,\dots,n\}}`, without the last state.
`states`	Character vector that represents the state space `E`. It has length equal to `s = \|E\|`.
`initial_dist`	Numerical vector of `s` probabilities, that represents the initial distribution for each state in the state space `E`.
`degree`	Positive integer that represents the polynomial degree `d` for the drifting semi-Markov model.
`f_is_drifting`	Logical. Specifies if `f` is drifting or not.
`p_is_drifting`	Logical. Specifies if `p` is drifting or not.
`p_dist`	Numerical array, that represents the probabilities of the transition matrix `p` of the embedded Markov chain `(J_{t})_{t\in \{0,\dots,n\}}` (it is defined the same way in the nonparametric_dsmm function). It can be defined in two ways: If `p` is not drifting, it has dimensions of `s \times s`. If `p` is drifting, it has dimensions of `s \times s \times (d+1)` (see more in Details, Defined Arguments.)
`f_dist`	Character array, that represents the discrete sojourn time distribution `f` of our choice. `NA` is allowed for state transitions that we do not wish to have a sojourn time distribution (e.g. all state transition to the same state should have `NA` as their value). The list of possible values is: `["unif", "geom", "pois", "dweibull", "nbinom", NA]`. It can be defined in two ways: If `f` is not drifting, it has dimensions of `s \times s`. If `f` is drifting, it has dimensions of `s \times s \times (d+1)` (see more in Details, Defined Arguments.)
`f_dist_pars`	Numerical array, that represents the parameters of the sojourn time distributions given in `f_dist`. `NA` is allowed, in the case that the distribution of our choice does not require a parameter. It can be defined in two ways: If `f` is not drifting, it has dimensions of `s \times s \times 2`, specifying two possible parameters required for the discrete distributions. If `f` is drifting, it has dimensions of `s \times s \times 2 \times (d+1)`, specifying two possible parameters required for the discrete distributions, but for every single one of the `i = 0, \dots, d` sojourn time distributions `f_{\frac{i}{d}}` that are required. (see more in Details, Defined Arguments.)

Details

Defined Arguments

For the parametric case, we explicitly define:

The transition matrix of the embedded Markov chain (J_{t})_{t\in \{0,\dots,n\}}, given in the attribute p_dist:
- If p is not drifting, it contains the values:
  
  p(u,v), \forall u, v \in E,
  
  given in an array with dimensions of s \times s, where the first dimension corresponds to the previous state u and the second dimension corresponds to the current state v.
- If p is drifting, for i \in \{ 0,\dots,d \}, it contains the values:
  
  p_{\frac{i}{d}}(u,v), \forall u, v \in E,
  
  given in an array with dimensions of s \times s \times (d + 1), where the first and second dimensions are defined as in the non-drifting case, and the third dimension corresponds to the d+1 different matrices p_{\frac{i}{d}}.
The conditional sojourn time distribution, given in the attribute f_dist:
- If f is not drifting, it contains the discrete distribution names (as characters or NA), given in an array with dimensions of s \times s, where the first dimension corresponds to the previous state u, the second dimension corresponds to the current state v.
- If f is drifting, it contains the discrete distribution names (as characters or NA) given in an array with dimensions of s \times s \times (d + 1), where the first and second dimensions are defined as in the non-drifting case, and the third dimension corresponds to the d+1 different arrays f_{\frac{i}{d}}.
The conditional sojourn time distribution parameters, given in the attribute f_dist_pars:
- If f is not drifting, it contains the numerical values (or NA) of the corresponding distributions defined in f_dist, given in an array with dimensions of s \times s, where the first dimension corresponds to the previous state u, the second dimension corresponds to the current state v.
- If f is drifting, it contains the numerical values (or NA) of the corresponding distributions defined in f_dist, given in an array with dimensions of s \times s \times (d + 1), where the first and second dimensions are defined as in the non-drifting case, and the third dimension corresponds to the d+1 different arrays f_{\frac{i}{d}}.

Sojourn time distributions

In this package, the available distributions for the modeling of the conditional sojourn times, of the drifting semi-Markov model, used through the argument f_dist, are the following:

Uniform (n):

f(x) = 1/n, for x = 1, 2, \dots, n, where n is a positive integer. This can be specified through the following:
- f_dist = "unif"
- f_dist_pars = (n, NA) (n as defined here).
Geometric (p):

f(x) = p (1-p)^{x-1}, for x = 1, 2, \dots, where p \in (0, 1) is the probability of success. This can be specified through the following:
- f_dist = "geom"
- f_dist_pars = (p, NA) (p as defined here).
Poisson (\lambda):

f(x) = \frac{\lambda^{x-1} exp(-\lambda)}{(x-1)!}, for x = 1, 2, \dots, where \lambda > 0. This can be specified through the following:
- f_dist = "pois"
- f_dist_pars = (\lambda, NA)
Negative binomial (\alpha, p):

f(x)=\frac{\Gamma(x+\alpha-1)}{\Gamma(\alpha)(x-1)!} p^{\alpha}(1-p)^{x-1}, for x = 1, 2,\dots, where \Gamma is the Gamma function, \alpha \in (0, +\infty) is the parameter describing the target for number of successful trials, or the dispersion parameter (the shape parameter of the gamma mixing distribution). p is the probability of success, 0 < p < 1.
- f_dist = "nbinom"
- f_dist_pars = (\alpha, p) (p as defined here)
Discrete Weibull of type 1 (q, \beta):

f(x)=q^{(x-1)^{\beta}}-q^{x^{\beta}}, for x=1,2,\dots, with q \in (0, 1) is the first parameter (probability) and \beta \in (0, +\infty) is the second parameter. This can be specified through the following:
- f_dist = "dweibull"
- f_dist_pars = (q, \beta) (q as defined here)

From these discrete distributions, by using "dweibull", "nbinom" we require two parameters. It's for this reason that the attribute f_dist_pars is an array of dimensions s \times s \times 2 if f is not drifting or s \times s \times 2 \times (d+1) if f is drifting.

Value

Returns an object of the S3 class dsmm_parametric, dsmm. It has the following attributes:

dist : List. Contains 3 arrays, passing down from the arguments:
- p_drift or p_notdrift, corresponding to whether the defined p transition matrix is drifting or not.
- f_drift_parametric or f_notdrift_parametric, corresponding to whether the defined f sojourn time distribution is drifting or not.
- f_drift_parameters or f_notdrift_parameters, which are the defined f sojourn time distribution parameters, depending on whether f is drifting or not.
initial_dist : Numerical vector. Passing down from the arguments. It contains the initial distribution of the drifting semi-Markov model.
states : Character vector. Passing down from the arguments. It contains the state space E.
s : Positive integer. It contains the number of states in the state space, s = |E|, which is given in the attribute states.
degree : Positive integer. Passing down from the arguments. It contains the polynomial degree d considered for the drifting of the model.
model_size : Positive integer. Passing down from the arguments. It contains the size of the drifting semi-Markov model n, which represents the length of the embedded Markov chain (J_{t})_{t\in \{0,\dots,n\}}, without the last state.
f_is_drifting : Logical. Passing down from the arguments. Specifies if f is drifting or not.
p_is_drifting : Logical. Passing down from the arguments. Specifies if p is drifting or not.
Model : Character. Possible values:
- "Model_1" : Both p and f are drifting.
- "Model_2" : p is drifting and f is not drifting.
- "Model_3" : f is drifting and p is not drifting.
A_i : Numerical matrix. Represents the polynomials A_i(t) with degree d that are used for solving the system MJ = P. Used for the methods defined for the object. Not printed when viewing the object.

References

V. S. Barbu, N. Limnios. (2008). semi-Markov Chains and Hidden semi-Markov Models Toward Applications - Their Use in Reliability and DNA Analysis. New York: Lecture Notes in Statistics, vol. 191, Springer.

Vergne, N. (2008). Drifting Markov models with Polynomial Drift and Applications to DNA Sequences. Statistical Applications in Genetics Molecular Biology 7 (1).

Barbu V. S., Vergne, N. (2019). Reliability and survival analysis for drifting Markov models: modeling and estimation. Methodology and Computing in Applied Probability, 21(4), 1407-1429.

T. Nakagawa and S. Osaki. (1975). The discrete Weibull distribution. IEEE Transactions on Reliability, R-24, 300-301.

Examples

# We can also define states in a flexible way, including spaces.
states <- c("Dollar $", " /1'2'3/ ", " Z E T A ", "O_M_E_G_A")
s <- length(states)
d <- 1

# ===========================================================================
# Defining parametric drifting semi-Markov models.
# ===========================================================================

# ---------------------------------------------------------------------------
# Defining the drifting distributions for Model 1.
# ---------------------------------------------------------------------------

# `p_dist` has dimensions of: (s, s, d + 1).
# Sums over v must be 1 for all u and i = 0, ..., d.

# First matrix.
p_dist_1 <- matrix(c(0,   0.1, 0.4, 0.5,
                     0.5, 0,   0.3, 0.2,
                     0.3, 0.4, 0,   0.3,
                     0.8, 0.1, 0.1, 0),
                   ncol = s, byrow = TRUE)

# Second matrix.
p_dist_2 <- matrix(c(0,   0.3, 0.6, 0.1,
                     0.3, 0,   0.4, 0.3,
                     0.5, 0.3, 0,   0.2,
                     0.2, 0.3, 0.5, 0),
                   ncol = s, byrow = TRUE)

# get `p_dist` as an array of p_dist_1 and p_dist_2.
p_dist_model_1 <- array(c(p_dist_1, p_dist_2), dim = c(s, s, d + 1))

# `f_dist` has dimensions of: (s, s, d + 1).
# First matrix.
f_dist_1 <- matrix(c(NA,         "unif", "dweibull", "nbinom",
                     "geom",      NA,    "pois",     "dweibull",
                     "dweibull", "pois",  NA,        "geom",
                     "pois",      NA,    "geom",      NA),
                   nrow = s, ncol = s, byrow = TRUE)


# Second matrix.
f_dist_2 <- matrix(c(NA,     "pois", "geom", "nbinom",
                     "geom",  NA,    "pois", "dweibull",
                     "unif", "geom",  NA,    "geom",
                     "pois", "pois", "geom",  NA),
                   nrow = s, ncol = s, byrow = TRUE)


# get `f_dist` as an array of `f_dist_1` and `f_dist_2`
f_dist_model_1 <- array(c(f_dist_1, f_dist_2), dim = c(s, s, d + 1))


# `f_dist_pars` has dimensions of: (s, s, 2, d + 1).
# First array of coefficients, corresponding to `f_dist_1`.
# First matrix.
f_dist_1_pars_1 <- matrix(c(NA,  5,  0.4, 4,
                            0.7, NA, 5,   0.6,
                            0.2, 3,  NA,  0.6,
                            4,   NA, 0.4, NA),
                          nrow = s, ncol = s, byrow = TRUE)
# Second matrix.
f_dist_1_pars_2 <- matrix(c(NA,  NA, 0.2, 0.6,
                            NA,  NA, NA,  0.8,
                            0.6, NA, NA,  NA,
                            NA,  NA, NA,  NA),
                          nrow = s, ncol = s, byrow = TRUE)

# Second array of coefficients, corresponding to `f_dist_2`.
# First matrix.
f_dist_2_pars_1 <- matrix(c(NA,  6,   0.4, 3,
                            0.7, NA,  2,   0.5,
                            3,   0.6, NA,  0.7,
                            6,   0.2, 0.7, NA),
                          nrow = s, ncol = s, byrow = TRUE)
# Second matrix.
f_dist_2_pars_2 <- matrix(c(NA, NA, NA, 0.6,
                            NA, NA, NA, 0.8,
                            NA, NA, NA, NA,
                            NA, NA, NA, NA),
                          nrow = s, ncol = s, byrow = TRUE)

# Get `f_dist_pars`.
f_dist_pars_model_1 <- array(c(f_dist_1_pars_1, f_dist_1_pars_2,
                               f_dist_2_pars_1, f_dist_2_pars_2),
                             dim = c(s, s, 2, d + 1))

# ---------------------------------------------------------------------------
# Parametric object for Model 1.
# ---------------------------------------------------------------------------

obj_par_model_1 <- parametric_dsmm(
    model_size = 10000,
    states = states,
    initial_dist = c(0.8, 0.1, 0.1, 0),
    degree = d,
    p_dist = p_dist_model_1,
    f_dist = f_dist_model_1,
    f_dist_pars = f_dist_pars_model_1,
    p_is_drifting = TRUE,
    f_is_drifting = TRUE
)

# p drifting array.
p_drift <- obj_par_model_1$dist$p_drift
p_drift

# f distribution.
f_dist_drift <- obj_par_model_1$dist$f_drift_parametric
f_dist_drift

# parameters for the f distribution.
f_dist_pars_drift <- obj_par_model_1$dist$f_drift_parameters
f_dist_pars_drift

# ---------------------------------------------------------------------------
# Defining Model 2 - p is drifting, f is not drifting.
# ---------------------------------------------------------------------------

# `p_dist` has the same dimensions as in Model 1: (s, s, d + 1).
p_dist_model_2 <- array(c(p_dist_1, p_dist_2), dim = c(s, s, d + 1))

# `f_dist` has dimensions of: (s, s).
f_dist_model_2 <- matrix(c( NA,       "pois",  NA,       "nbinom",
                            "geom",    NA,    "geom",    "dweibull",
                            "unif",   "geom",  NA,       "geom",
                            "nbinom", "unif", "dweibull", NA),
                         nrow = s, ncol = s, byrow = TRUE)

# `f_dist_pars` has dimensions of: (s, s, 2),
#  corresponding to `f_dist_model_2`.

# First matrix.
f_dist_pars_1_model_2 <- matrix(c(NA,  0.2, NA,  3,
                                  0.2, NA,  0.2, 0.5,
                                  3,   0.4, NA,  0.7,
                                  2,   3,   0.7, NA),
                                nrow = s, ncol = s, byrow = TRUE)

# Second matrix.
f_dist_pars_2_model_2 <- matrix(c(NA,  NA, NA,  0.6,
                                  NA,  NA, NA,  0.8,
                                  NA,  NA, NA,  NA,
                                  0.2, NA, 0.3, NA),
                                nrow = s, ncol = s, byrow = TRUE)


# Get `f_dist_pars`.
f_dist_pars_model_2 <- array(c(f_dist_pars_1_model_2,
                               f_dist_pars_2_model_2),
                             dim = c(s, s, 2))


# ---------------------------------------------------------------------------
# Parametric object for Model 2.
# ---------------------------------------------------------------------------

obj_par_model_2 <- parametric_dsmm(
    model_size = 10000,
    states = states,
    initial_dist = c(0.8, 0.1, 0.1, 0),
    degree = d,
    p_dist = p_dist_model_2,
    f_dist = f_dist_model_2,
    f_dist_pars = f_dist_pars_model_2,
    p_is_drifting = TRUE,
    f_is_drifting = FALSE
)

# p drifting array.
p_drift <- obj_par_model_2$dist$p_drift
p_drift

# f distribution.
f_dist_notdrift <- obj_par_model_2$dist$f_notdrift_parametric
f_dist_notdrift

# parameters for the f distribution.
f_dist_pars_notdrift <- obj_par_model_2$dist$f_notdrift_parameters
f_dist_pars_notdrift

# ---------------------------------------------------------------------------
# Defining Model 3 - f is drifting, p is not drifting.
# ---------------------------------------------------------------------------

# `p_dist` has dimensions of: (s, s).
p_dist_model_3 <- matrix(c(0,   0.1,  0.3,  0.6,
                           0.4, 0,    0.1,  0.5,
                           0.4, 0.3,  0,    0.3,
                           0.9, 0.01, 0.09, 0),
                         ncol = s, byrow = TRUE)

# `f_dist` has the same dimensions as in Model 1: (s, s, d + 1).
f_dist_model_3 <- array(c(f_dist_1, f_dist_2), dim = c(s, s, d + 1))


# `f_dist_pars` has the same dimensions as in Model 1: (s, s, 2, d + 1).
f_dist_pars_model_3 <- array(c(f_dist_1_pars_1, f_dist_1_pars_2,
                               f_dist_2_pars_1, f_dist_2_pars_2),
                             dim = c(s, s, 2, d + 1))

# ---------------------------------------------------------------------------
# Parametric object for Model 3.
# ---------------------------------------------------------------------------

obj_par_model_3 <- parametric_dsmm(
    model_size = 10000,
    states = states,
    initial_dist = c(0.3, 0.2, 0.2, 0.3),
    degree = d,
    p_dist = p_dist_model_3,
    f_dist = f_dist_model_3,
    f_dist_pars = f_dist_pars_model_3,
    p_is_drifting = FALSE,
    f_is_drifting = TRUE
)

# p drifting array.
p_notdrift <- obj_par_model_3$dist$p_notdrift
p_notdrift

# f distribution.
f_dist_drift <- obj_par_model_3$dist$f_drift_parametric
f_dist_drift

# parameters for the f distribution.
f_dist_pars_drift <- obj_par_model_3$dist$f_drift_parameters
f_dist_pars_drift

# ===========================================================================
# Parametric estimation using methods corresponding to an object
#     which inherits from the class `dsmm_parametric`.
# ===========================================================================

### Comments
### 1.  Using a larger `klim` and a larger `model_size` will increase the
###     accuracy of the model, with the need of larger memory requirements
###     and computational cost.
### 2.  For the parametric estimation it is recommended to use a common set
###     of distributions while only the parameters are drifting. This results
###     in higher accuracy.


# ---------------------------------------------------------------------------
# Defining the distributions for Model 1 - both p and f are drifting.
# ---------------------------------------------------------------------------

# `p_dist` has dimensions of: (s, s, d + 1).
# First matrix.
p_dist_1 <- matrix(c(0,   0.2, 0.4, 0.4,
                     0.5, 0,   0.3, 0.2,
                     0.3, 0.4, 0,   0.3,
                     0.5, 0.3, 0.2, 0),
                   ncol = s, byrow = TRUE)

# Second matrix.
p_dist_2 <- matrix(c(0,   0.3, 0.5, 0.2,
                     0.3, 0,   0.4, 0.3,
                     0.5, 0.3, 0,   0.2,
                     0.2, 0.4, 0.4, 0),
                   ncol = s, byrow = TRUE)

# get `p_dist` as an array of p_dist_1 and p_dist_2.
p_dist_model_1 <- array(c(p_dist_1, p_dist_2), dim = c(s, s, d + 1))

# `f_dist` has dimensions of: (s, s, d + 1).
# We will use the same sojourn time distributions.
f_dist_1 <- matrix(c( NA,        "unif",   "dweibull", "nbinom",
                     "geom",      NA,      "pois",     "dweibull",
                     "dweibull", "pois",    NA,        "geom",
                     "pois",     'nbinom', "geom",      NA),
                   nrow = s, ncol = s, byrow = TRUE)

# get `f_dist`
f_dist_model_1 <- array(f_dist_1, dim = c(s, s, d + 1))

# `f_dist_pars` has dimensions of: (s, s, 2, d + 1).
# First array of coefficients, corresponding to `f_dist_1`.
# First matrix.
f_dist_1_pars_1 <- matrix(c(NA,  7,  0.4, 4,
                            0.7, NA, 5,   0.6,
                            0.2, 3,  NA,  0.6,
                            4,   4,  0.4, NA),
                          nrow = s, ncol = s, byrow = TRUE)
# Second matrix.
f_dist_1_pars_2 <- matrix(c(NA,  NA,  0.2, 0.6,
                            NA,  NA,  NA,  0.8,
                            0.6, NA,  NA,  NA,
                            NA,  0.3, NA,  NA),
                          nrow = s, ncol = s, byrow = TRUE)

# Second array of coefficients, corresponding to `f_dist_2`.
# First matrix.
f_dist_2_pars_1 <- matrix(c(NA,  6,  0.5, 3,
                            0.5, NA, 4,   0.5,
                            0.4, 5,  NA,  0.7,
                            6,   5,  0.7, NA),
                          nrow = s, ncol = s, byrow = TRUE)
# Second matrix.
f_dist_2_pars_2 <- matrix(c(NA,  NA,  0.4, 0.5,
                            NA,  NA,  NA,  0.6,
                            0.5, NA,  NA,  NA,
                            NA,  0.4, NA,  NA),
                          nrow = s, ncol = s, byrow = TRUE)
# Get `f_dist_pars`.
f_dist_pars_model_1 <- array(c(f_dist_1_pars_1, f_dist_1_pars_2,
                               f_dist_2_pars_1, f_dist_2_pars_2),
                             dim = c(s, s, 2, d + 1))

# ---------------------------------------------------------------------------
# Defining the parametric object for Model 1.
# ---------------------------------------------------------------------------

obj_par_model_1 <- parametric_dsmm(
    model_size = 4000,
    states = states,
    initial_dist = c(0.8, 0.1, 0.1, 0),
    degree = d,
    p_dist = p_dist_model_1,
    f_dist = f_dist_model_1,
    f_dist_pars = f_dist_pars_model_1,
    p_is_drifting = TRUE,
    f_is_drifting = TRUE
)

cat("The object has class of (",
    paste0(class(obj_par_model_1),
           collapse = ', '), ").")


# ---------------------------------------------------------------------------
# Generating a sequence from the parametric object.
# ---------------------------------------------------------------------------

# A larger klim will lead to an increase in accuracy.
klim <- 20
sim_seq <- simulate(obj_par_model_1, klim = klim, seed = 1)


# ---------------------------------------------------------------------------
# Fitting the generated sequence under the same distributions.
# ---------------------------------------------------------------------------

fit_par_model1 <- fit_dsmm(sequence = sim_seq,
                           states = states,
                           degree = d,
                           f_is_drifting = TRUE,
                           p_is_drifting = TRUE,
                           estimation = 'parametric',
                           f_dist = f_dist_model_1)

cat("The object has class of (",
    paste0(class(fit_par_model1),
           collapse = ', '), ").")

cat("\nThe estimated parameters are:\n")
fit_par_model1$dist$f_drift_parameters

[Package dsmmR version 1.0.5 Index]