bipartition {RcppML}R Documentation

Bipartition a sample set

Description

Spectral biparitioning by rank-2 matrix factorization

Usage

bipartition(
  A,
  tol = 1e-05,
  maxit = 100,
  nonneg = TRUE,
  samples = 1:ncol(A),
  seed = NULL,
  verbose = FALSE,
  calc_dist = FALSE,
  diag = TRUE
)

Arguments

A

matrix of features-by-samples in dense or sparse format (preferred classes are "matrix" or "Matrix::dgCMatrix", respectively). Prefer sparse storage when more than half of all values are zero.

tol

stopping criteria, the correlation distance between w across consecutive iterations, 1 - cor(w_i, w_{i-1})

maxit

stopping criteria, maximum number of alternating updates of w and h

nonneg

enforce non-negativity

samples

samples to include in bipartition, numbered from 1 to ncol(A). Default is NULL for all samples.

seed

random seed for model initialization

verbose

print model tolerances between iterations

calc_dist

calculate the relative cosine distance of samples within a cluster to either cluster centroid. If TRUE, centers for clusters will also be calculated.

diag

scale factors in w and h to sum to 1 by introducing a diagonal, d. This should generally never be set to FALSE. Diagonalization enables symmetry of models in factorization of symmetric matrices, convex L1 regularization, and consistent factor scalings.

Details

Spectral bipartitioning is a popular subroutine in divisive clustering. The sign of the difference between sample loadings in factors of a rank-2 matrix factorization gives a bipartition that is nearly identical to an SVD.

Rank-2 matrix factorization by alternating least squares is faster than rank-2-truncated SVD (i.e. irlba).

This function is a specialization of rank-2 nmf with support for factorization of only a subset of samples, and with additional calculations on the factorization model relevant to bipartitioning. See nmf for details regarding rank-2 factorization.

Value

A list giving the bipartition and useful statistics:

Author(s)

Zach DeBruine

References

Kuang, D, Park, H. (2013). "Fast rank-2 nonnegative matrix factorization for hierarchical document clustering." Proc. 19th ACM SIGKDD intl. conf. on Knowledge discovery and data mining.

See Also

nmf, dclust

Examples

## Not run: 
library(Matrix)
data(iris)
A <- as(as.matrix(iris[,1:4]), "dgCMatrix")
bipartition(A, calc_dist = TRUE)

## End(Not run)

[Package RcppML version 0.3.7 Index]