R: Repeat a biclustering to achieve a minimum SSE solution

rep_biclustermd {biclustermd}

R Documentation

Repeat a biclustering to achieve a minimum SSE solution

Description

Repeat a biclustering to achieve a minimum SSE solution

Usage

rep_biclustermd(
  data,
  nrep = 10,
  parallel = FALSE,
  ncores = 2,
  col_clusters = floor(sqrt(ncol(data))),
  row_clusters = floor(sqrt(nrow(data))),
  miss_val = mean(data, na.rm = TRUE),
  miss_val_sd = 1,
  similarity = "Rand",
  row_min_num = 5,
  col_min_num = 5,
  row_num_to_move = 1,
  col_num_to_move = 1,
  row_shuffles = 1,
  col_shuffles = 1,
  max.iter = 100
)

Arguments

`data`	Dataset to bicluster. Must to be a data matrix with only numbers and missing values in the data set. It should have row names and column names.
`nrep`	The number of times to repeat the biclustering. Default 10.
`parallel`	Logical indicating if the user would like to utilize the `foreach` parallel backend. Default is FALSE.
`ncores`	The number of cores to use if parallel computing. Default 2.
`col_clusters`	The number of clusters to partition the columns into.
`row_clusters`	The number of clusters to partition the rows into.
`miss_val`	Value or function to put in empty cells of the prototype matrix. If a value, a random normal variable with sd = `miss_val_sd` is used each iteration.
`miss_val_sd`	Standard deviation of the normal distribution `miss_val` follows if `miss_val` is a number. By default this equals 1.
`similarity`	The metric used to compare two successive clusterings. Can be "Rand" (default), "HA" for the Hubert and Arabie adjusted Rand index or "Jaccard". See RRand and for details.
`row_min_num`	Minimum row prototype size in order to be eligible to be chosen when filling an empty row prototype. Default is 5.
`col_min_num`	Minimum column prototype size in order to be eligible to be chosen when filling an empty row prototype. Default is 5.
`row_num_to_move`	Number of rows to remove from the sampled prototype to put in the empty row prototype. Default is 1.
`col_num_to_move`	Number of columns to remove from the sampled prototype to put in the empty column prototype. Default is 1.
`row_shuffles`	Number of times to shuffle rows in each iteration. Default is 1.
`col_shuffles`	Number of times to shuffle columns in each iteration. Default is 1.
`max.iter`	Maximum number of iterations to let the algorithm run for.

Value

A list of the minimum SSE biclustering, a vector containing the final SSE of each repeat, and the time it took the function to run.

References

Li, J., Reisner, J., Pham, H., Olafsson, S., and Vardeman, S. (2019) Biclustering for Missing Data. Information Sciences, Submitted

Examples

data("synthetic")

# 20 repeats without parallelization
repeat_bc <- rep_biclustermd(synthetic, nrep = 20,
                             col_clusters = 3, row_clusters = 2,
                             miss_val = mean(synthetic, na.rm = TRUE),
                             miss_val_sd = sd(synthetic, na.rm = TRUE),
                             col_min_num = 2, row_min_num = 2,
                             col_num_to_move = 1, row_num_to_move = 1,
                             max.iter = 10)
repeat_bc
autoplot(repeat_bc$best_bc)
plot(repeat_bc$rep_sse, type = 'b', pch = 20)
repeat_bc$runtime

# 20 repeats with parallelization over 2 cores
repeat_bc <- rep_biclustermd(synthetic, nrep = 20, parallel = TRUE, ncores = 2,
                             col_clusters = 3, row_clusters = 2,
                             miss_val = mean(synthetic, na.rm = TRUE),
                             miss_val_sd = sd(synthetic, na.rm = TRUE),
                             col_min_num = 2, row_min_num = 2,
                             col_num_to_move = 1, row_num_to_move = 1,
                             max.iter = 10)
repeat_bc$runtime

[Package biclustermd version 0.2.3 Index]

Repeat a biclustering to achieve a minimum SSE solution

Description

Usage

Arguments

Value

References

See Also

Examples