R: Empirical conditional probability distributions of order 'L'

cprob {PST}

R Documentation

Empirical conditional probability distributions of order `L`

Description

Compute the empirical conditional probability distributions of order L from a set of sequences

Usage

## S4 method for signature 'stslist'
cprob(object, L, cdata=NULL, context, stationary=TRUE, nmin=1, prob=TRUE, 
weighted=TRUE, with.missing=FALSE, to.list=FALSE)

Arguments

`object`	a sequence object, that is an object of class stslist as created by TraMineR `seqdef` function.
`L`	integer. Context length.
`cdata`	under development
`context`	character. An optional subsequence (a character string where symbols are separated by '-') for which the conditional probability distribution is to be computed.
`stationary`	logical. If `FALSE` probability distributions are computed for each sequence position L+1 ... l where l is the maximum sequence length. If `TRUE` the probability distributions are stationary that is time homogenous.
`nmin`	integer. Minimal frequency of a context. See details.
`prob`	logical. If `TRUE` the probability distributions are returned. If `FALSE` the function returns the empirical counts on which the probability distributions are computed.
`weighted`	logical. If `TRUE` case weights attached to the sequence object are used in the computation of the probabilities.
`with.missing`	logical. If `FALSE` only contexts contining no missing status are considered.
`to.list`	logical. If `TRUE` and `stationary=TRUE`, a list instead of a matrix is returned. See `value`.

Details

The empirical conditional probability \hat{P}(\sigma | c) of observing a symbol \sigma \in A after the subsequence c=c_{1}, \ldots, c_{k} of length k=L is computed as

\hat{P}(\sigma | c) = \frac{N(c\sigma)}{\sum_{\alpha \in A} N(c\alpha)}

where

N(c)=\sum_{i=1}^{\ell} 1 \left[x_{i}, \ldots, x_{i+|c|-1}=c \right], \; x=x_{1}, \ldots, x_{\ell}, \; c=c_{1}, \ldots, c_{k}

is the number of occurrences of the subsequence c in the sequence x and c\sigma is the concatenation of the subsequence c and the symbol \sigma.

Considering a - possibly weighted - sample of m sequences having weights w^{j}, \; j=1 \ldots m, the function N(c) is replaced by

N(c)=\sum_{j=1}^{m} w^{j} \sum_{i=1}^{\ell} 1 \left[x_{i}^{j}, \ldots, x_{i+|c|-1}^{j}=c \right], \; c=c_{1}, \ldots, c_{k}

where x^{j}=x_{1}^{j}, \ldots, x_{\ell}^{j} is the jth sequence in the sample. For more details, see Gabadinho 2016.

Value

If stationary=TRUE a matrix with one row for each subsequence of length L and minimal frequency nmin appearing in object. If stationary=FALSE a list where each element corresponds to one subsequence and contains a matrix whith the probability distribution at each position p where a state is preceded by the subsequence.

Author(s)

Alexis Gabadinho

References

Gabadinho, A. & Ritschard, G. (2016). Analyzing State Sequences with Probabilistic Suffix Trees: The PST R Package. Journal of Statistical Software, 72(3), pp. 1-39.

Examples

## Example with the single sequence s1
data(s1)
s1 <- seqdef(s1)
cprob(s1, L=0, prob=FALSE)
cprob(s1, L=1, prob=TRUE)

## Preparing a sequence object with the SRH data set
data(SRH)
state.list <- levels(SRH$p99c01)
## sequential color palette
mycol5 <- rev(brewer.pal(5, "RdYlGn"))
SRH.seq <- seqdef(SRH, 5:15, alphabet=state.list, states=c("G1", "G2", "M", "B2", "B1"), 
	labels=state.list, weights=SRH$wp09lp1s, right=NA, cpal=mycol5)
names(SRH.seq) <- 1999:2009

## Example 1: 0th order: weighted and unweigthed counts
cprob(SRH.seq, L=0, prob=FALSE, weighted=FALSE)
cprob(SRH.seq, L=0, prob=FALSE, weighted=TRUE)

## Example 2: 2th order: weighted and unweigthed probability distrib.
cprob(SRH.seq, L=2, prob=TRUE, weighted=FALSE)
cprob(SRH.seq, L=2, prob=TRUE, weighted=TRUE)

[Package PST version 0.94.1 Index]

Empirical conditional probability distributions of order L