cprob {PST} | R Documentation |
Empirical conditional probability distributions of order L
Description
Compute the empirical conditional probability distributions of order L from a set of sequences
Usage
## S4 method for signature 'stslist'
cprob(object, L, cdata=NULL, context, stationary=TRUE, nmin=1, prob=TRUE,
weighted=TRUE, with.missing=FALSE, to.list=FALSE)
Arguments
object |
a sequence object, that is an object of class stslist as created by TraMineR |
L |
integer. Context length. |
cdata |
under development |
context |
character. An optional subsequence (a character string where symbols are separated by '-') for which the conditional probability distribution is to be computed. |
stationary |
logical. If |
nmin |
integer. Minimal frequency of a context. See details. |
prob |
logical. If |
weighted |
logical. If |
with.missing |
logical. If |
to.list |
logical. If |
Details
The empirical conditional probability \hat{P}(\sigma | c)
of observing a symbol \sigma \in A
after the subsequence c=c_{1}, \ldots, c_{k}
of length k=L
is computed as
\hat{P}(\sigma | c) = \frac{N(c\sigma)}{\sum_{\alpha \in A} N(c\alpha)}
where
N(c)=\sum_{i=1}^{\ell} 1 \left[x_{i}, \ldots, x_{i+|c|-1}=c \right], \; x=x_{1}, \ldots, x_{\ell}, \; c=c_{1}, \ldots, c_{k}
is the number of occurrences of the subsequence c
in the sequence x
and c\sigma
is the concatenation of the subsequence c
and the symbol \sigma
.
Considering a - possibly weighted - sample of m
sequences having weights w^{j}, \; j=1 \ldots m
, the function N(c)
is replaced by
N(c)=\sum_{j=1}^{m} w^{j} \sum_{i=1}^{\ell} 1 \left[x_{i}^{j}, \ldots, x_{i+|c|-1}^{j}=c \right], \; c=c_{1}, \ldots, c_{k}
where x^{j}=x_{1}^{j}, \ldots, x_{\ell}^{j}
is the j
th sequence in the sample. For more details, see Gabadinho 2016.
Value
If stationary=TRUE
a matrix with one row for each subsequence of length L
and minimal frequency nmin
appearing in object
. If stationary=FALSE
a list where each element corresponds to one subsequence and contains a matrix whith the probability distribution at each position p
where a state is preceded by the subsequence.
Author(s)
Alexis Gabadinho
References
Gabadinho, A. & Ritschard, G. (2016). Analyzing State Sequences with Probabilistic Suffix Trees: The PST R Package. Journal of Statistical Software, 72(3), pp. 1-39.
Examples
## Example with the single sequence s1
data(s1)
s1 <- seqdef(s1)
cprob(s1, L=0, prob=FALSE)
cprob(s1, L=1, prob=TRUE)
## Preparing a sequence object with the SRH data set
data(SRH)
state.list <- levels(SRH$p99c01)
## sequential color palette
mycol5 <- rev(brewer.pal(5, "RdYlGn"))
SRH.seq <- seqdef(SRH, 5:15, alphabet=state.list, states=c("G1", "G2", "M", "B2", "B1"),
labels=state.list, weights=SRH$wp09lp1s, right=NA, cpal=mycol5)
names(SRH.seq) <- 1999:2009
## Example 1: 0th order: weighted and unweigthed counts
cprob(SRH.seq, L=0, prob=FALSE, weighted=FALSE)
cprob(SRH.seq, L=0, prob=FALSE, weighted=TRUE)
## Example 2: 2th order: weighted and unweigthed probability distrib.
cprob(SRH.seq, L=2, prob=TRUE, weighted=FALSE)
cprob(SRH.seq, L=2, prob=TRUE, weighted=TRUE)