| sim_data {iClusterVB} | R Documentation |
Simulated Dataset
Description
The dataset consists of N = 240 individuals and R =
4 data views with different data types. Two of the data views are
continuous, one is count, and one is binary. The true number of
clusters was set to K = 4, and the cluster proportions were set at \pi_1
= 0.25, \pi_2 = 0.25, \pi_3 = 0.25, \pi_4 = 0.25, such that we have
balanced cluster proportions. Each of the data views had p_r = 500
features, r = 1, \dots, 4, but only 50, or 10%, were relevant
features that contributed to the clustering, and the rest were noise
features that did not contribute to the clustering. In total, there were
p = \sum_{r=1}^4 = 2000 features.
For data view 1 (continuous), relevant features were generated from the
following normal distributions: \text{N}(10, 1) for Cluster 1,
\text{N}(5, 1) for Cluster 2, \text{N}(-5, 1) for Cluster 3,
and \text{N}(-10, 1) for Cluster 4, while noise features were
generated from \text{N}(0, 1). For data view 2 (continuous), relevant
features were generated from the following normal distributions:
\text{N}(-10, 1) for Cluster 1, \text{N}(-5, 1) for Cluster
2, \text{N}(5, 1) for Cluster 3, and \text{N}(10, 1) for
Cluster 4, while noise features were generated from \text{N}(0, 1).
For data view 3 (binary), relevant features were generated from the
following Bernoulli distributions: \text{Bernoulli}(0.05) for Cluster
1, \text{Bernoulli}(0.2) for Cluster 2,
\text{Bernoulli}(0.4) for Cluster 3, and \text{Bernoulli}(0.6)
for Cluster 4, while noise features were generated from
\text{Bernoulli}(0.1). For data view 4 (count), relevant features
were generated from the following Poisson distributions:
\text{Poisson}(50) for Cluster 1, \text{Poisson}(35) for
Cluster 2, \text{Poisson}(20) for Cluster 3, and
\text{Poisson}(10) for Cluster 4, while noise features were generated
from \text{Poisson}(2).
Usage
data(sim_data)
Format
A list containing four datasets, and other elements of interest.