R: Non-parametric Drifting semi-Markov model specification

nonparametric_dsmm {dsmmR}

R Documentation

Non-parametric Drifting semi-Markov model specification

Description

Creates a non-parametric model specification for a drifting semi-Markov model. Returns an object of class (dsmm_nonparametric, dsmm).

Usage

nonparametric_dsmm(
  model_size,
  states,
  initial_dist,
  degree,
  k_max,
  f_is_drifting,
  p_is_drifting,
  p_dist,
  f_dist
)

Arguments

`model_size`	Positive integer that represents the size of the drifting semi-Markov model `n`. It is equal to the length of a theoretical embedded Markov chain `(J_{t})_{t\in \{0,\dots,n\}}`, without the last state.
`states`	Character vector that represents the state space `E` . It has length equal to `s = \|E\|`.
`initial_dist`	Numerical vector of `s` probabilities, that represents the initial distribution for each state in the state space `E`.
`degree`	Positive integer that represents the polynomial degree `d` for the drifting semi-Markov model.
`k_max`	Positive integer that represents the maximum sojourn time of choice, for the drifting semi-Markov model.
`f_is_drifting`	Logical. Specifies if `f` is drifting or not.
`p_is_drifting`	Logical. Specifies if `p` is drifting or not.
`p_dist`	Numerical array, that represents the probabilities of the transition matrix `p` of the embedded Markov chain `(J_{t})_{t\in \{0,\dots,n\}}` (it is defined the same way in the parametric_dsmm function). It can be defined in two ways: If `p` is not drifting, it has dimensions of `s \times s`. If `p` is drifting, it has dimensions of `s \times s \times (d+1)` (see more in Details, Defined Arguments.)
`f_dist`	Numerical array, that represents the probabilities of the conditional sojourn time distributions `f`. `0` is allowed for state transitions that we do not wish to have a sojourn time distribution (e.g. all state transitions to the same state should have `0` as their value). It can be defined in two ways: If `f` is not drifting, it has dimensions of `s \times s \times k_{max}`. If `f` is drifting, it has dimensions of `s \times s \times k_{max} \times (d+1)` (see more in Details, Defined Arguments.)

Details

Defined Arguments

For the non-parametric case, we explicitly define:

The transition matrix of the embedded Markov chain (J_{t})_{t\in \{0,\dots,n\}}, given in the attribute p_dist:
- If p is not drifting, it contains the values:
  
  p(u, v), \forall u, v \in E,
  
  given in an array with dimensions of s \times s, where the first dimension corresponds to the previous state u and the second dimension corresponds to the current state v.
- If p is drifting then, for i \in\{0,\dots,d\}, it contains the values:
  
  p_{\frac{i}{d}}(u,v), \forall u, v \in E,
  
  given in an array with dimensions of s \times s \times (d + 1), where the first and second dimensions are defined as in the non-drifting case, and the third dimension corresponds to the d+1 different matrices p_{\frac{i}{d}}.
The conditional sojourn time distribution, given in the attribute f_dist:
- If f is not drifting, it contains the values:
  
  f(u,v,l), \forall u,v\in E,\forall l\in \{1,\dots,k_{max}\},
  
  given in an array with dimensions of s \times s \times k_{max}, where the first dimension corresponds to the previous state u, the second dimension corresponds to the current state v, and the third dimension correspond to the sojourn time l.
- If f is drifting then, for i\in \{0,\dots,d\}, it contains the values:
  
  f_{\frac{i}{d}}(u,v,l),\forall u,v\in E, \forall l\in \{1,\dots,k_{max}\},
  
  given in an array with dimensions of s \times s \times k_{max} \times (d + 1), where the first, second and third dimensions are defined as in the non-drifting case, and the fourth dimension corresponds to the d+1 different arrays f_{\frac{i}{d}}.

Value

Returns an object of the S3 class dsmm_nonparametric,dsmm.

dist : List. Contains 2 arrays, passing down from the arguments:
- p_drift or p_notdrift, corresponding to whether the defined p transition matrix is drifting or not.
- f_drift or f_notdrift, corresponding to whether the defined f sojourn time distribution is drifting or not.
initial_dist : Numerical vector. Passing down from the arguments. It contains the initial distribution of the drifting semi-Markov model.
states : Character vector. Passing down from the arguments. It contains the state space E.
s : Positive integer. It contains the number of states in the state space, s = |E|, which is given in the attribute states.
degree : Positive integer. Passing down from the arguments. It contains the polynomial degree d considered for the drifting of the model.
k_max : Numerical value. Passing down from the arguments. It contains the maximum sojourn time, for the drifting semi-Markov model.
model_size : Positive integer. Passing down from the arguments. It contains the size of the drifting semi-Markov model n, which represents the length of the embedded Markov chain (J_{t})_{t\in \{0,\dots,n\}}, without the last state.
f_is_drifting : Logical. Passing down from the arguments. Specifies if f is drifting or not.
p_is_drifting : Logical. Passing down from the arguments. Specifies if p is drifting or not.
Model : Character. Possible values:
- "Model_1" : Both p and f are drifting.
- "Model_2" : p is drifting and f is not drifting.
- "Model_3" : f is drifting and p is not drifting.
A_i : Numerical Matrix. Represents the polynomials A_i(t) with degree d that are used for solving the system MJ = P. Used for the methods defined for the object. Not printed when viewing the object.

References

V. S. Barbu, N. Limnios. (2008). semi-Markov Chains and Hidden semi-Markov Models Toward Applications - Their Use in Reliability and DNA Analysis. New York: Lecture Notes in Statistics, vol. 191, Springer.

Vergne, N. (2008). Drifting Markov models with Polynomial Drift and Applications to DNA Sequences. Statistical Applications in Genetics Molecular Biology 7 (1).

Barbu V. S., Vergne, N. (2019). Reliability and survival analysis for drifting Markov models: modeling and estimation. Methodology and Computing in Applied Probability, 21(4), 1407-1429.

Examples

# Setup.
states <- c("AA", "AC", "CC")
s <- length(states)
d <- 2
k_max <- 3

# ===========================================================================
# Defining non-parametric drifting semi-Markov models.
# ===========================================================================

# ---------------------------------------------------------------------------
# Defining distributions for Model 1 - both p and f are drifting.
# ---------------------------------------------------------------------------

# `p_dist` has dimensions of: (s, s, d + 1).
# Sums over v must be 1 for all u and i = 0, ..., d.
p_dist_1 <- matrix(c(0,   0.1, 0.9,
                     0.5, 0,   0.5,
                     0.3, 0.7, 0),
                   ncol = s, byrow = TRUE)

p_dist_2 <- matrix(c(0,   0.6, 0.4,
                     0.7, 0,   0.3,
                     0.6, 0.4, 0),
                   ncol = s, byrow = TRUE)

p_dist_3 <- matrix(c(0,   0.2, 0.8,
                     0.6, 0,   0.4,
                     0.7, 0.3, 0),
                   ncol = s, byrow = TRUE)

# Get `p_dist` as an array of p_dist_1, p_dist_2 and p_dist_3.
p_dist <- array(c(p_dist_1, p_dist_2, p_dist_3),
                dim = c(s, s, d + 1))

# `f_dist` has dimensions of: (s, s, k_max, d + 1).
# First f distribution. Dimensions: (s, s, k_max).
# Sums over l must be 1, for every u, v and i = 0, ..., d.
f_dist_1_l_1 <- matrix(c(0,   0.2, 0.7,
                         0.3, 0,   0.4,
                         0.2, 0.8, 0),
                       ncol = s, byrow = TRUE)

f_dist_1_l_2 <- matrix(c(0,   0.3,  0.2,
                         0.2, 0,    0.5,
                         0.1, 0.15, 0),
                       ncol = s, byrow = TRUE)

f_dist_1_l_3 <- matrix(c(0,   0.5,  0.1,
                         0.5, 0,    0.1,
                         0.7, 0.05, 0),
                       ncol = s, byrow = TRUE)
# Get f_dist_1
f_dist_1 <- array(c(f_dist_1_l_1, f_dist_1_l_2, f_dist_1_l_3),
                  dim = c(s, s, k_max))

# Second f distribution. Dimensions: (s, s, k_max)
f_dist_2_l_1 <- matrix(c(0,   1/3, 0.4,
                         0.3, 0,   0.4,
                         0.2, 0.1, 0),
                       ncol = s, byrow = TRUE)

f_dist_2_l_2 <- matrix(c(0,   1/3, 0.4,
                         0.4, 0,   0.2,
                         0.3, 0.4, 0),
                       ncol = s, byrow = TRUE)

f_dist_2_l_3 <- matrix(c(0,   1/3, 0.2,
                         0.3, 0,   0.4,
                         0.5, 0.5, 0),
                       ncol = s, byrow = TRUE)

# Get f_dist_2
f_dist_2 <- array(c(f_dist_2_l_1, f_dist_2_l_2, f_dist_2_l_3),
                  dim = c(s, s, k_max))

# Third f distribution. Dimensions: (s, s, k_max)
f_dist_3_l_1 <- matrix(c(0,    0.3, 0.3,
                         0.3,  0,   0.5,
                         0.05, 0.1, 0),
                       ncol = s, byrow = TRUE)

f_dist_3_l_2 <- matrix(c(0,   0.2, 0.6,
                         0.3, 0,   0.35,
                         0.9, 0.2, 0),
                       ncol = s, byrow = TRUE)

f_dist_3_l_3 <- matrix(c(0,    0.5, 0.1,
                         0.4,  0,   0.15,
                         0.05, 0.7, 0),
                       ncol = s, byrow = TRUE)

# Get f_dist_3
f_dist_3 <- array(c(f_dist_3_l_1, f_dist_3_l_2, f_dist_3_l_3),
                  dim = c(s, s, k_max))

# Get f_dist as an array of f_dist_1, f_dist_2 and f_dist_3.
f_dist <- array(c(f_dist_1, f_dist_2, f_dist_3),
                dim = c(s, s, k_max, d + 1))

# ---------------------------------------------------------------------------
# Non-Parametric object for Model 1.
# ---------------------------------------------------------------------------

obj_nonpar_model_1 <- nonparametric_dsmm(
    model_size = 8000,
    states = states,
    initial_dist = c(0.3, 0.5, 0.2),
    degree = d,
    k_max = k_max,
    p_dist = p_dist,
    f_dist = f_dist,
    p_is_drifting = TRUE,
    f_is_drifting = TRUE
)

# p drifting array.
p_drift <- obj_nonpar_model_1$dist$p_drift
p_drift

# f distribution.
f_drift <- obj_nonpar_model_1$dist$f_drift
f_drift

# ---------------------------------------------------------------------------
# Defining Model 2 - p is drifting, f is not drifting.
# ---------------------------------------------------------------------------

# p_dist has the same dimensions as in Model 1: (s, s, d + 1).
p_dist_model_2 <- array(c(p_dist_1, p_dist_2, p_dist_3),
                        dim = c(s, s, d + 1))

# f_dist has dimensions of: (s,s,k_{max}).
f_dist_model_2 <- f_dist_2


# ---------------------------------------------------------------------------
# Non-Parametric object for Model 2.
# ---------------------------------------------------------------------------

obj_nonpar_model_2 <- nonparametric_dsmm(
    model_size = 10000,
    states = states,
    initial_dist = c(0.7, 0.1, 0.2),
    degree = d,
    k_max = k_max,
    p_dist = p_dist_model_2,
    f_dist = f_dist_model_2,
    p_is_drifting = TRUE,
    f_is_drifting = FALSE
)

# p drifting array.
p_drift <- obj_nonpar_model_2$dist$p_drift
p_drift

# f distribution array.
f_notdrift <- obj_nonpar_model_2$dist$f_notdrift
f_notdrift


# ---------------------------------------------------------------------------
# Defining Model 3 - f is drifting, p is not drifting.
# ---------------------------------------------------------------------------


# `p_dist` has dimensions of: (s, s, d + 1).
p_dist_model_3 <- p_dist_3


# `f_dist` has the same dimensions as in Model 1: (s, s, d + 1).
f_dist_model_3 <- array(c(f_dist_1, f_dist_2, f_dist_3),
                        dim = c(s, s, k_max, d + 1))


# ---------------------------------------------------------------------------
# Non-Parametric object for Model 3.
# ---------------------------------------------------------------------------

obj_nonpar_model_3 <- nonparametric_dsmm(
    model_size = 10000,
    states = states,
    initial_dist = c(0.3, 0.4, 0.3),
    degree = d,
    k_max = k_max,
    p_dist = p_dist_model_3,
    f_dist = f_dist_model_3,
    p_is_drifting = FALSE,
    f_is_drifting = TRUE
)

# p distribution matrix.
p_notdrift <- obj_nonpar_model_3$dist$p_notdrift
p_notdrift

# f distribution array.
f_drift <- obj_nonpar_model_3$dist$f_drift
f_drift

# ===========================================================================
# Using methods for non-parametric objects.
# ===========================================================================

kernel_parametric <- get_kernel(obj_nonpar_model_3)
str(kernel_parametric)

sim_seq_par <- simulate(obj_nonpar_model_3, nsim = 50)
str(sim_seq_par)

[Package dsmmR version 1.0.5 Index]