fitsmm {smmR} | R Documentation |
Maximum Likelihood Estimation (MLE) of a semi-Markov chain
Description
Maximum Likelihood Estimation of a semi-Markov chain starting from one or several sequences. This estimation can be parametric or non-parametric, non-censored, censored at the beginning and/or at the end of the sequence, with one or several trajectories. Several parametric distributions are considered (Uniform, Geometric, Poisson, Discrete Weibull and Negative Binomial).
Usage
fitsmm(
sequences,
states,
type.sojourn = c("fij", "fi", "fj", "f"),
distr = "nonparametric",
init.estim = "mle",
cens.beg = FALSE,
cens.end = FALSE
)
Arguments
sequences |
A list of vectors representing the sequences. |
states |
Vector of state space (of length s). |
type.sojourn |
Type of sojourn time (for more explanations, see Details). |
distr |
By default If the parametric estimation case is desired,
The distributions to be used in |
init.estim |
Optional.
|
cens.beg |
Optional. A logical value indicating whether or not sequences are censored at the beginning. |
cens.end |
Optional. A logical value indicating whether or not sequences are censored at the end. |
Details
This function estimates a semi-Markov model in parametric or non-parametric case, taking into account the type of sojourn time and the censoring described in references. The non-parametric estimation concerns sojourn time distributions defined by the user. For the parametric estimation, several discrete distributions are considered (see below).
The difference between the Markov model and the semi-Markov model concerns the modeling of the sojourn time. With a Markov chain, the sojourn time distribution is modeled by a Geometric distribution (in discrete time). With a semi-Markov chain, the sojourn time can be any arbitrary distribution. In this package, the available distribution for a semi-Markov model are :
Uniform:
f(x) = \frac{1}{n}
for1 \le x \le n
.n
is the parameter;Geometric:
f(x) = \theta (1-\theta)^{x - 1}
forx = 1, 2,\ldots,n
,0 < \theta < 1
,\theta
is the probability of success.\theta
is the parameter;Poisson:
f(x) = \frac{\lambda^x exp(-\lambda)}{x!}
forx = 0, 1, 2,\ldots,n
, with\lambda > 0
.\lambda
is the parameter;Discrete Weibull of type 1:
f(x)=q^{(x-1)^{\beta}}-q^{x^{\beta}}
,x = 1, 2,\ldots,n
, with0 < q < 1
, the first parameter and\beta > 0
the second parameter.(q, \beta)
are the parameters;Negative binomial:
f(x)=\frac{\Gamma(x+\alpha)}{\Gamma(\alpha) x!} p^{\alpha} (1 - p)^x
, forx = 0, 1, 2,\ldots,n
,\Gamma
is the Gamma function,\alpha
is the parameter of overdispersion andp
is the probability of success,0 < p < 1
.(\alpha, p)
are the parameters;Non-parametric.
We define :
the semi-Markov kernel
q_{ij}(k) = P( J_{m+1} = j, T_{m+1} - T_{m} = k | J_{m} = i )
;the transition matrix
(p_{trans}(i,j))_{i,j} \in states
of the embedded Markov chainJ = (J_m)_m
,p_{trans}(i,j) = P( J_{m+1} = j | J_m = i )
;the initial distribution
\mu_i = P(J_1 = i) = P(Z_1 = i)
,i \in 1, 2, \dots, s
;the conditional sojourn time distributions
(f_{ij}(k))_{i,j} \in states,\ k \in N ,\ f_{ij}(k) = P(T_{m+1} - T_m = k | J_m = i, J_{m+1} = j )
,f
is specified by the argumentparam
in the parametric case and bydistr
in the non-parametric case.
The maximum likelihood estimation of the transition matrix of the embedded
Markov chain is \widehat{p_{trans}}(i,j) = \frac{N_{ij}}{N_{i.}}
.
Five methods are proposed for the estimation of the initial distribution :
- Maximum Likelihood Estimator:
The Maximum Likelihood Estimator for the initial distribution. The formula is:
\widehat{\mu_i} = \frac{Nstart_i}{L}
, whereNstart_i
is the number of occurences of the wordi
(of lengthk
) at the beginning of each sequence andL
is the number of sequences. This estimator is reliable when the number of sequencesL
is high.- Limit (stationary) distribution:
The limit (stationary) distribution of the semi-Markov chain is used as a surrogate of the initial distribution.
- Frequencies of each state:
The initial distribution is replaced by taking the frequencies of each state in the sequences.
- Uniform distribution:
The initial probability of each state is equal to
1 / s
, withs
, the number of states.
Note that q_{ij}(k) = p_{trans}(i,j) \ f_{ij}(k)
in the general case
(depending on the present state and on the next state). For particular cases,
we replace f_{ij}(k)
by f_{i.}(k)
(depending on the present
state i
), f_{.j}(k)
(depending on the next state j
) and
f_{..}(k)
(depending neither on the present state nor on the next
state).
In this package we can choose different types of sojourn time. Four options are available for the sojourn times:
depending on the present state and on the next state (
fij
);depending only on the present state (
fi
);depending only on the next state (
fj
);depending neither on the current, nor on the next state (
f
).
If type.sojourn = "fij"
, distr
is a matrix of dimension (s, s)
(e.g., if the 1st element of the 2nd column is "pois"
, that is to say we
go from the first state to the second state following a Poisson distribution).
If type.sojourn = "fi"
or "fj"
, distr
must be a vector (e.g., if the
first element of vector is "geom"
, that is to say we go from (or to) the
first state to (or from) any state according to a Geometric distribution).
If type.sojourn = "f"
, distr
must be one of "unif"
, "geom"
, "pois"
,
"dweibull"
, "nbinom"
(e.g., if distr
is equal to "nbinom"
, that is
to say that the sojourn time when going from one state to another state
follows a Negative Binomial distribution).
For the non-parametric case, distr
is equal to "nonparametric"
whatever
type of sojourn time given.
If the sequence is censored at the beginning and/or at the end, cens.beg
must be equal to TRUE
and/or cens.end
must be equal to TRUE
.
All the sequences must be censored in the same way.
Value
Returns an object of S3 class smmfit
(inheriting from the S3
class smm
and smmnonparametric class if distr = "nonparametric"
or smmparametric otherwise). The S3 class smmfit
contains:
All the attributes of the S3 class smmnonparametric or smmparametric depending on the type of estimation;
An attribute
M
which is an integer giving the total length of the set of sequencessequences
(sum of all the lengths of the listsequences
);An attribute
loglik
which is a numeric value giving the value of the log-likelihood of the specified semi-Markov model based on thesequences
;An attribute
sequences
which is equal to the parametersequences
of the functionfitsmm
(i.e. a list of sequences used to estimate the Markov model).
References
V. S. Barbu, N. Limnios. (2008). Semi-Markov Chains and Hidden Semi-Markov Models Toward Applications - Their Use in Reliability and DNA Analysis. New York: Lecture Notes in Statistics, vol. 191, Springer.
See Also
smmnonparametric, smmparametric, simulate.smm, simulate.smmfit, plot.smm, plot.smmfit
Examples
states <- c("a", "c", "g", "t")
s <- length(states)
# Creation of the initial distribution
vect.init <- c(1 / 4, 1 / 4, 1 / 4, 1 / 4)
# Creation of the transition matrix
pij <- matrix(c(0, 0.2, 0.5, 0.3,
0.2, 0, 0.3, 0.5,
0.3, 0.5, 0, 0.2,
0.4, 0.2, 0.4, 0),
ncol = s, byrow = TRUE)
# Creation of the distribution matrix
distr.matrix <- matrix(c(NA, "pois", "geom", "nbinom",
"geom", NA, "pois", "dweibull",
"pois", "pois", NA, "geom",
"pois", "geom", "geom", NA),
nrow = s, ncol = s, byrow = TRUE)
# Creation of an array containing the parameters
param1.matrix <- matrix(c(NA, 2, 0.4, 4,
0.7, NA, 5, 0.6,
2, 3, NA, 0.6,
4, 0.3, 0.4, NA),
nrow = s, ncol = s, byrow = TRUE)
param2.matrix <- matrix(c(NA, NA, NA, 0.6,
NA, NA, NA, 0.8,
NA, NA, NA, NA,
NA, NA, NA, NA),
nrow = s, ncol = s, byrow = TRUE)
param.array <- array(c(param1.matrix, param2.matrix), c(s, s, 2))
# Specify the semi-Markov model
semimarkov <- smmparametric(states = states, init = vect.init, ptrans = pij,
type.sojourn = "fij", distr = distr.matrix,
param = param.array)
seqs <- simulate(object = semimarkov, nsim = c(1000, 10000, 2000), seed = 100)
# Estimation of simulated sequences
est <- fitsmm(sequences = seqs, states = states, type.sojourn = "fij",
distr = distr.matrix)