R: Simulate data

sim_seq {clickb}

R Documentation

Simulate data

Description

This function simulate a sequential dataset from a mixture of first-order Markov models generating categorical sequences. The output is a dataframe, columns are "id" to identify a subject/sequence, "y" to identify a categorical observation related to the sequence and "clus" the cluster label.

Usage

sim_seq(M, K, ini.prob, trans.prob, clust.size, T.range)

Arguments

`M`	is the number of components
`K`	is the number of Markov model states
`ini.prob`	is a list of initial probability vectors for each component
`trans.prob`	is a list of transition matrices for each component
`clust.size`	is a list of components' sizes
`T.range`	is a vector of two elements: minimum and maximum sequence length

Value

Object of class data.frame

Author(s)

Furio Urso furio.urso@unipa.it

Examples

# Simulate dataset from a mixture of Markov models 
M <- 3    # number of components
K <- 5    # number of states
# define initial and transition probabilities for each component
ini1<-c(0.35, 0, 0.3, 0.2, 0.15) 
A1<-matrix(c(0.15, 0.1, 0.5, 0, 0.25,     
             0.2, 0, 0.1, 0.2, 0.5,       
             0.6, 0.1, 0.1, 0.2, 0,       
             0, 0.45, 0.35, 0.1, 0.1,       
             0.15, 0.25, 0, 0.1, 0.5),byrow=TRUE,nrow=5) 

ini2<-c(0.25, 0, 0.2, 0.25, 0.3)
A2<-matrix(c(0,0.8,0,0,0.2,         
             0.2,0,0.8,0,0,         
             0,0.2,0,0.8,0,         
             0,0,0.2,0,0.8,          
             0.8,0,0,0.2,0),byrow=TRUE,nrow=5) 

ini3<-c(0.3, 0, 0.25, 0.3, 0.15)
A3<-matrix(c(0,0.1,0.2,0,0.7,          
             0.7,0,0.2,0.1,0,         
             0.1,0.8,0,0.1,0,           
             0,0.1,0.7,0,0.2,                
             0.2,0,0,0.8,0),byrow=TRUE,nrow=5) 

trans.prob <- list(A1, A2, A3)
ini.prob <- list(ini1, ini2, ini3)

# sizes i.e. number of sequences in each component
N.sim1<-20
N.sim2<-30
N.sim3<-50

clust.size <- list(N.sim1, N.sim2, N.sim3)

T.range <- c(5, 30)  # sequences minimum length and maximum length

data<- sim_seq( M, K, ini.prob, trans.prob, clust.size, T.range)

[Package clickb version 0.1 Index]