sparsemdc_gap {SparseMDC}R Documentation

Gap Statistic Calculator

Description

This function calculates the gap statistic for SparseMDC. For use when the number of clusters in the data is unknown. We recommend using alternate methods to infer the number of clusters in the data.

Usage

sparsemdc_gap(pdat, dim, min_clus, max_clus, nboots = 200, nitter = 20,
  nstarts = 10, l1_boot = 50, l2_boot = 50)

Arguments

pdat

list with D entries, each entry contains data d, p * n matrix. This data should be centered and log-transformed.

dim

Total number of conditions, D.

min_clus

The minimum number of clusters to try, minimum value is 2.

max_clus

The maximum number of clusters to try.

nboots

The number of bootstrap repetitions to use, default = 200.

nitter

The max number of iterations for each of the start values, the default value is 20.

nstarts

The number of start values to use for SparseDC. The default value is 10.

l1_boot

The number of bootstrap repetitions used for estimating lambda 1.

l2_boot

The number of bootstrap repetitions used for estimating lambda 2.

Value

A list containing the optimal number of clusters, as well as gap statistics and the calculated standard error for each number of clusters.

Examples

set.seed(10)
# Select small dataset for example
data_test <- data_biase[1:100,]
# Split data into condition A and B
data_A <- data_test[ , which(condition_biase == "A")]
data_B <- data_test[ , which(condition_biase == "B")]
data_C <- data_test[ , which(condition_biase == "C")]
# Store data as list
dat_l <- list(data_A, data_B, data_C)
# Pre-process the data
pdat <- pre_proc_data(dat_l, dim=3, norm = FALSE, log = TRUE,
center = TRUE)
# Run with one bootstrap sample for example
gap_stat <- sparsemdc_gap(pdat, dim=3, min_clus = 2, max_clus =3, nboots =2,
nitter = 2, nstarts = 1, l1_boot = 5, l2_boot = 5)


[Package SparseMDC version 0.99.5 Index]