sim_data {iClusterVB} | R Documentation |
Simulated Dataset
Description
The dataset consists of individuals and
data views with different data types. Two of the data views are
continuous, one is count, and one is binary. The true number of
clusters was set to
, and the cluster proportions were set at
, such that we have
balanced cluster proportions. Each of the data views had
features,
, but only 50, or 10%, were relevant
features that contributed to the clustering, and the rest were noise
features that did not contribute to the clustering. In total, there were
features.
For data view 1 (continuous), relevant features were generated from the
following normal distributions: for Cluster 1,
for Cluster 2,
for Cluster 3,
and
for Cluster 4, while noise features were
generated from
. For data view 2 (continuous), relevant
features were generated from the following normal distributions:
for Cluster 1,
for Cluster
2,
for Cluster 3, and
for
Cluster 4, while noise features were generated from
.
For data view 3 (binary), relevant features were generated from the
following Bernoulli distributions:
for Cluster
1,
for Cluster 2,
for Cluster 3, and
for Cluster 4, while noise features were generated from
. For data view 4 (count), relevant features
were generated from the following Poisson distributions:
for Cluster 1,
for
Cluster 2,
for Cluster 3, and
for Cluster 4, while noise features were generated
from
.
Usage
data(sim_data)
Format
A list containing four datasets, and other elements of interest.