synth {ClickClust} | R Documentation |
Illustrative dataset: sequences of five states
Description
The data represents the synthetic dataset used as an
illustrative example in the Journal of Statistical Software paper
discussing the use of the package.
There are 5 states denoted as A
, B
, C
, D
, and E
. Categorical sequences have lengths varying from 10 to 50.
Usage
data(synth)
Format
$data contains a vector of 250 strings representing categorical sequences; $id is the original classification vector.
Source
Melnykov, V. (2015)
References
Melnykov, V. (2016) Model-Based Biclustering of Clickstream Data, Computational Statistics and Data Analysis, 93, 31-45.
Melnykov, V. (2016) ClickClust: An R Package for Model-Based Clustering of Categorical Sequences, Journal of Statistical Software, 74, 1-34.
See Also
click.read
Examples
data(synth)
head(synth$data)
# FUNCTION THAT REPLACES CHARACTER STATES WITH NUMERIC VALUES
repl.levs <- function(x, ch.lev){
for (j in 1:length(ch.lev)) x <- gsub(ch.levs[j], j, x)
return(x)
}
# DETECT ALL STATES IN THE DATASET
d <- paste(synth$data, collapse = " ")
d <- strsplit(d, " ")[[1]]
ch.levs <- levels(as.factor(d))
# CONVERT DATA TO THE FORM USED BY click.read()
S <- strsplit(synth$data, " ")
S <- sapply(S, repl.levs, ch.levs)
S <- sapply(S, as.numeric)
head(S)
[Package ClickClust version 1.1.6 Index]