simulate_HD_data {RJcluster} | R Documentation |
simulate_HD_data
Description
This is simulaiton data to check performance of RJcluster. Data can be simulated for any n, P, and size of clusters. The data has two types of
data: noisy data and signal data. The percent of the data that is noisy is controlled by the sparsity paramater. The noisy data has two parts:
half of it is N(0,1)
and half is N(0, noise_variance)
. The signal data is divided in two as well, half of it is
N(\mu[,1], signal_variance)
and half N(\mu[,2], signal_variance)
.
Usage
simulate_HD_data(
size_vector = c(20, 20, 20, 20),
p = 220,
mu = matrix(c(1.5, 2.5, 0, 1.5, 0, -1.5, -2.5, -1.5), ncol = 2, byrow = TRUE),
signal_variance = 1,
noise_variance = 1,
sparsity = 0.09,
seed = 1234
)
Arguments
size_vector |
A list of the size of the different clusters. (default = a balanced case of 4 clusters of size 20, c(20, 20, 20, 20)) |
p |
The number of columns in the simulated matrix (default = 220) |
mu |
The matrix of means, of dimension length(size_vector)x2. The first column of means is for the first half informative features, the second columns of mean is for the second half of the informative features (default is described in RJcluster paper) |
signal_variance |
Variance of the signal part of the generated data. A value of 1 indicates a high SNR, a value of 2 indicates a low SNR (default = 1) |
noise_variance |
Variance of the noisy part of the generated data (Default = 1) |
sparsity |
What percent of the data should be informative? A value between 0 and 1, a higher value means more data is informative (default = 0.09) |
seed |
Random seed. Change if generating multiple simulation datasets (default = 1234) |
Details
The data in the paper is generated with number of clusters = 4, a balanced case of c(20, 20, 20, 20) and an unbalanced case of c(20, 20, 200, 200),
with p = 220 in both cases. The default is a balanced, high signal case with \mu
as the matrix in the RJcluster paper.
Value
Returns simulation data for X and Y values
X | Matrix of dimension sum(size_vector)xp |
Y | Vector of class labels of length \sum(size_vector) , with unique values of 1:length(size_vector) |
Examples
data = simulate_HD_data()
X = data$X
Y = data$X
print(head(X))