simulate_HD_data {RJcluster}R Documentation

simulate_HD_data

Description

This is simulaiton data to check performance of RJcluster. Data can be simulated for any n, P, and size of clusters. The data has two types of data: noisy data and signal data. The percent of the data that is noisy is controlled by the sparsity paramater. The noisy data has two parts: half of it is N(0,1) and half is N(0, noise_variance). The signal data is divided in two as well, half of it is N(\mu[,1], signal_variance) and half N(\mu[,2], signal_variance).

Usage

simulate_HD_data(
  size_vector = c(20, 20, 20, 20),
  p = 220,
  mu = matrix(c(1.5, 2.5, 0, 1.5, 0, -1.5, -2.5, -1.5), ncol = 2, byrow = TRUE),
  signal_variance = 1,
  noise_variance = 1,
  sparsity = 0.09,
  seed = 1234
)

Arguments

size_vector

A list of the size of the different clusters. (default = a balanced case of 4 clusters of size 20, c(20, 20, 20, 20))

p

The number of columns in the simulated matrix (default = 220)

mu

The matrix of means, of dimension length(size_vector)x2. The first column of means is for the first half informative features, the second columns of mean is for the second half of the informative features (default is described in RJcluster paper)

signal_variance

Variance of the signal part of the generated data. A value of 1 indicates a high SNR, a value of 2 indicates a low SNR (default = 1)

noise_variance

Variance of the noisy part of the generated data (Default = 1)

sparsity

What percent of the data should be informative? A value between 0 and 1, a higher value means more data is informative (default = 0.09)

seed

Random seed. Change if generating multiple simulation datasets (default = 1234)

Details

The data in the paper is generated with number of clusters = 4, a balanced case of c(20, 20, 20, 20) and an unbalanced case of c(20, 20, 200, 200), with p = 220 in both cases. The default is a balanced, high signal case with \mu as the matrix in the RJcluster paper.

Value

Returns simulation data for X and Y values

X Matrix of dimension sum(size_vector)xp
Y Vector of class labels of length \sum(size_vector), with unique values of 1:length(size_vector)

Examples

data = simulate_HD_data()
X = data$X
Y = data$X
print(head(X))

[Package RJcluster version 3.2.4 Index]