nonparametric_dsmm {dsmmR}R Documentation

Non-parametric Drifting semi-Markov model specification

Description

Creates a non-parametric model specification for a drifting semi-Markov model. Returns an object of class (dsmm_nonparametric, dsmm).

Usage

nonparametric_dsmm(
  model_size,
  states,
  initial_dist,
  degree,
  k_max,
  f_is_drifting,
  p_is_drifting,
  p_dist,
  f_dist
)

Arguments

model_size

Positive integer that represents the size of the drifting semi-Markov model nn. It is equal to the length of a theoretical embedded Markov chain (Jt)t{0,,n}(J_{t})_{t\in \{0,\dots,n\}}, without the last state.

states

Character vector that represents the state space EE . It has length equal to s=Es = |E|.

initial_dist

Numerical vector of ss probabilities, that represents the initial distribution for each state in the state space EE.

degree

Positive integer that represents the polynomial degree dd for the drifting semi-Markov model.

k_max

Positive integer that represents the maximum sojourn time of choice, for the drifting semi-Markov model.

f_is_drifting

Logical. Specifies if ff is drifting or not.

p_is_drifting

Logical. Specifies if pp is drifting or not.

p_dist

Numerical array, that represents the probabilities of the transition matrix pp of the embedded Markov chain (Jt)t{0,,n}(J_{t})_{t\in \{0,\dots,n\}} (it is defined the same way in the parametric_dsmm function). It can be defined in two ways:

  • If pp is not drifting, it has dimensions of s×ss \times s.

  • If pp is drifting, it has dimensions of s×s×(d+1)s \times s \times (d+1) (see more in Details, Defined Arguments.)

f_dist

Numerical array, that represents the probabilities of the conditional sojourn time distributions ff. 00 is allowed for state transitions that we do not wish to have a sojourn time distribution (e.g. all state transitions to the same state should have 00 as their value). It can be defined in two ways:

  • If ff is not drifting, it has dimensions of s×s×kmaxs \times s \times k_{max}.

  • If ff is drifting, it has dimensions of s×s×kmax×(d+1)s \times s \times k_{max} \times (d+1) (see more in Details, Defined Arguments.)

Details

Defined Arguments

For the non-parametric case, we explicitly define:

  1. The transition matrix of the embedded Markov chain (Jt)t{0,,n}(J_{t})_{t\in \{0,\dots,n\}}, given in the attribute p_dist:

    • If pp is not drifting, it contains the values:

      p(u,v),u,vE,p(u, v), \forall u, v \in E,

      given in an array with dimensions of s×ss \times s, where the first dimension corresponds to the previous state uu and the second dimension corresponds to the current state vv.

    • If pp is drifting then, for i{0,,d}i \in\{0,\dots,d\}, it contains the values:

      pid(u,v),u,vE,p_{\frac{i}{d}}(u,v), \forall u, v \in E,

      given in an array with dimensions of s×s×(d+1)s \times s \times (d + 1), where the first and second dimensions are defined as in the non-drifting case, and the third dimension corresponds to the d+1d+1 different matrices pid.p_{\frac{i}{d}}.

  2. The conditional sojourn time distribution, given in the attribute f_dist:

    • If ff is not drifting, it contains the values:

      f(u,v,l),u,vE,l{1,,kmax},f(u,v,l), \forall u,v\in E,\forall l\in \{1,\dots,k_{max}\},

      given in an array with dimensions of s×s×kmaxs \times s \times k_{max}, where the first dimension corresponds to the previous state uu, the second dimension corresponds to the current state vv, and the third dimension correspond to the sojourn time ll.

    • If ff is drifting then, for i{0,,d}i\in \{0,\dots,d\}, it contains the values:

      fid(u,v,l),u,vE,l{1,,kmax},f_{\frac{i}{d}}(u,v,l),\forall u,v\in E, \forall l\in \{1,\dots,k_{max}\},

      given in an array with dimensions of s×s×kmax×(d+1)s \times s \times k_{max} \times (d + 1), where the first, second and third dimensions are defined as in the non-drifting case, and the fourth dimension corresponds to the d+1d+1 different arrays fid.f_{\frac{i}{d}}.

Value

Returns an object of the S3 class dsmm_nonparametric,dsmm.

References

V. S. Barbu, N. Limnios. (2008). semi-Markov Chains and Hidden semi-Markov Models Toward Applications - Their Use in Reliability and DNA Analysis. New York: Lecture Notes in Statistics, vol. 191, Springer.

Vergne, N. (2008). Drifting Markov models with Polynomial Drift and Applications to DNA Sequences. Statistical Applications in Genetics Molecular Biology 7 (1).

Barbu V. S., Vergne, N. (2019). Reliability and survival analysis for drifting Markov models: modeling and estimation. Methodology and Computing in Applied Probability, 21(4), 1407-1429.

See Also

Methods applied to this object: simulate.dsmm, get_kernel.

For the parametric drifting semi-Markov model specification: parametric_dsmm.

For the theoretical background of drifting semi-Markov models: dsmmR.

Examples

# Setup.
states <- c("AA", "AC", "CC")
s <- length(states)
d <- 2
k_max <- 3

# ===========================================================================
# Defining non-parametric drifting semi-Markov models.
# ===========================================================================

# ---------------------------------------------------------------------------
# Defining distributions for Model 1 - both p and f are drifting.
# ---------------------------------------------------------------------------

# `p_dist` has dimensions of: (s, s, d + 1).
# Sums over v must be 1 for all u and i = 0, ..., d.
p_dist_1 <- matrix(c(0,   0.1, 0.9,
                     0.5, 0,   0.5,
                     0.3, 0.7, 0),
                   ncol = s, byrow = TRUE)

p_dist_2 <- matrix(c(0,   0.6, 0.4,
                     0.7, 0,   0.3,
                     0.6, 0.4, 0),
                   ncol = s, byrow = TRUE)

p_dist_3 <- matrix(c(0,   0.2, 0.8,
                     0.6, 0,   0.4,
                     0.7, 0.3, 0),
                   ncol = s, byrow = TRUE)

# Get `p_dist` as an array of p_dist_1, p_dist_2 and p_dist_3.
p_dist <- array(c(p_dist_1, p_dist_2, p_dist_3),
                dim = c(s, s, d + 1))

# `f_dist` has dimensions of: (s, s, k_max, d + 1).
# First f distribution. Dimensions: (s, s, k_max).
# Sums over l must be 1, for every u, v and i = 0, ..., d.
f_dist_1_l_1 <- matrix(c(0,   0.2, 0.7,
                         0.3, 0,   0.4,
                         0.2, 0.8, 0),
                       ncol = s, byrow = TRUE)

f_dist_1_l_2 <- matrix(c(0,   0.3,  0.2,
                         0.2, 0,    0.5,
                         0.1, 0.15, 0),
                       ncol = s, byrow = TRUE)

f_dist_1_l_3 <- matrix(c(0,   0.5,  0.1,
                         0.5, 0,    0.1,
                         0.7, 0.05, 0),
                       ncol = s, byrow = TRUE)
# Get f_dist_1
f_dist_1 <- array(c(f_dist_1_l_1, f_dist_1_l_2, f_dist_1_l_3),
                  dim = c(s, s, k_max))

# Second f distribution. Dimensions: (s, s, k_max)
f_dist_2_l_1 <- matrix(c(0,   1/3, 0.4,
                         0.3, 0,   0.4,
                         0.2, 0.1, 0),
                       ncol = s, byrow = TRUE)

f_dist_2_l_2 <- matrix(c(0,   1/3, 0.4,
                         0.4, 0,   0.2,
                         0.3, 0.4, 0),
                       ncol = s, byrow = TRUE)

f_dist_2_l_3 <- matrix(c(0,   1/3, 0.2,
                         0.3, 0,   0.4,
                         0.5, 0.5, 0),
                       ncol = s, byrow = TRUE)

# Get f_dist_2
f_dist_2 <- array(c(f_dist_2_l_1, f_dist_2_l_2, f_dist_2_l_3),
                  dim = c(s, s, k_max))

# Third f distribution. Dimensions: (s, s, k_max)
f_dist_3_l_1 <- matrix(c(0,    0.3, 0.3,
                         0.3,  0,   0.5,
                         0.05, 0.1, 0),
                       ncol = s, byrow = TRUE)

f_dist_3_l_2 <- matrix(c(0,   0.2, 0.6,
                         0.3, 0,   0.35,
                         0.9, 0.2, 0),
                       ncol = s, byrow = TRUE)

f_dist_3_l_3 <- matrix(c(0,    0.5, 0.1,
                         0.4,  0,   0.15,
                         0.05, 0.7, 0),
                       ncol = s, byrow = TRUE)

# Get f_dist_3
f_dist_3 <- array(c(f_dist_3_l_1, f_dist_3_l_2, f_dist_3_l_3),
                  dim = c(s, s, k_max))

# Get f_dist as an array of f_dist_1, f_dist_2 and f_dist_3.
f_dist <- array(c(f_dist_1, f_dist_2, f_dist_3),
                dim = c(s, s, k_max, d + 1))

# ---------------------------------------------------------------------------
# Non-Parametric object for Model 1.
# ---------------------------------------------------------------------------

obj_nonpar_model_1 <- nonparametric_dsmm(
    model_size = 8000,
    states = states,
    initial_dist = c(0.3, 0.5, 0.2),
    degree = d,
    k_max = k_max,
    p_dist = p_dist,
    f_dist = f_dist,
    p_is_drifting = TRUE,
    f_is_drifting = TRUE
)

# p drifting array.
p_drift <- obj_nonpar_model_1$dist$p_drift
p_drift

# f distribution.
f_drift <- obj_nonpar_model_1$dist$f_drift
f_drift

# ---------------------------------------------------------------------------
# Defining Model 2 - p is drifting, f is not drifting.
# ---------------------------------------------------------------------------

# p_dist has the same dimensions as in Model 1: (s, s, d + 1).
p_dist_model_2 <- array(c(p_dist_1, p_dist_2, p_dist_3),
                        dim = c(s, s, d + 1))

# f_dist has dimensions of: (s,s,k_{max}).
f_dist_model_2 <- f_dist_2


# ---------------------------------------------------------------------------
# Non-Parametric object for Model 2.
# ---------------------------------------------------------------------------

obj_nonpar_model_2 <- nonparametric_dsmm(
    model_size = 10000,
    states = states,
    initial_dist = c(0.7, 0.1, 0.2),
    degree = d,
    k_max = k_max,
    p_dist = p_dist_model_2,
    f_dist = f_dist_model_2,
    p_is_drifting = TRUE,
    f_is_drifting = FALSE
)

# p drifting array.
p_drift <- obj_nonpar_model_2$dist$p_drift
p_drift

# f distribution array.
f_notdrift <- obj_nonpar_model_2$dist$f_notdrift
f_notdrift


# ---------------------------------------------------------------------------
# Defining Model 3 - f is drifting, p is not drifting.
# ---------------------------------------------------------------------------


# `p_dist` has dimensions of: (s, s, d + 1).
p_dist_model_3 <- p_dist_3


# `f_dist` has the same dimensions as in Model 1: (s, s, d + 1).
f_dist_model_3 <- array(c(f_dist_1, f_dist_2, f_dist_3),
                        dim = c(s, s, k_max, d + 1))


# ---------------------------------------------------------------------------
# Non-Parametric object for Model 3.
# ---------------------------------------------------------------------------

obj_nonpar_model_3 <- nonparametric_dsmm(
    model_size = 10000,
    states = states,
    initial_dist = c(0.3, 0.4, 0.3),
    degree = d,
    k_max = k_max,
    p_dist = p_dist_model_3,
    f_dist = f_dist_model_3,
    p_is_drifting = FALSE,
    f_is_drifting = TRUE
)

# p distribution matrix.
p_notdrift <- obj_nonpar_model_3$dist$p_notdrift
p_notdrift

# f distribution array.
f_drift <- obj_nonpar_model_3$dist$f_drift
f_drift

# ===========================================================================
# Using methods for non-parametric objects.
# ===========================================================================

kernel_parametric <- get_kernel(obj_nonpar_model_3)
str(kernel_parametric)

sim_seq_par <- simulate(obj_nonpar_model_3, nsim = 50)
str(sim_seq_par)

[Package dsmmR version 1.0.5 Index]