clustStruct {MMOC} | R Documentation |
Generate multi-view data sets with simple cluster structures
Description
Generates multiple data sets from a multivariate normal distribution using the mvrnorm function from the MASS
package.
Usage
clustStruct(n, p, k, noiseDat = "random", randNoise = 2)
Arguments
n |
An integer, the sample size for all generated data sets |
p |
An integer, the number of columns (features) in each generated data set |
k |
An integer or vector, the number of distinct clusters in each generated data set. |
noiseDat |
Either the character string |
randNoise |
The value along the diagonal when |
Details
The function accepts k
as a vector. It splits data into k
groups with means c(0, 2^( 1:(kk-1) ) )
, e.g., when k=3
the data will be split into 3 groups with means 0, 2, and 4, respectively. The covariance matrix is either a diagonal matrix with randNoise
(an integer) along the diagonal, or a given matrix.
Value
A list of n\times
p data frames with the specified number of groups
Examples
## A single view with 30 variables and 3 groups
s1 <- clustStruct(n=120, p=30, k=3, noiseDat='random')[[1]]
## Multiple views with 30 variables
## View 1 has 2 groups and View 2 has 3 groups
s2 <- clustStruct(n=120, p=30, k=c(2,3), noiseDat='random')
## Multiple views with 30 variables
## View 1 has 2 groups, View 2 has 3, and View 3 has 3 groups
s3 <- clustStruct(n=120, p=30, k=c(2,3,3), noiseDat='random')
## Three view study.
# View 1: 2 groups, 30 variables, random noise = 5
# View 2: 3 groups, 60 variables, random noise = 2
# View 3: 4 groups, 45 variables, random noise = 4
s4 <- clustStruct(n=120, k=c(2,3,4), p=c(30,60,45), randNoise=c(5,2,4))