bipartition {RcppML} | R Documentation |
Bipartition a sample set
Description
Spectral biparitioning by rank-2 matrix factorization
Usage
bipartition(
A,
tol = 1e-05,
maxit = 100,
nonneg = TRUE,
samples = 1:ncol(A),
seed = NULL,
verbose = FALSE,
calc_dist = FALSE,
diag = TRUE
)
Arguments
A |
matrix of features-by-samples in dense or sparse format (preferred classes are "matrix" or "Matrix::dgCMatrix", respectively). Prefer sparse storage when more than half of all values are zero. |
tol |
stopping criteria, the correlation distance between |
maxit |
stopping criteria, maximum number of alternating updates of |
nonneg |
enforce non-negativity |
samples |
samples to include in bipartition, numbered from 1 to |
seed |
random seed for model initialization |
verbose |
print model tolerances between iterations |
calc_dist |
calculate the relative cosine distance of samples within a cluster to either cluster centroid. If |
diag |
scale factors in |
Details
Spectral bipartitioning is a popular subroutine in divisive clustering. The sign of the difference between sample loadings in factors of a rank-2 matrix factorization gives a bipartition that is nearly identical to an SVD.
Rank-2 matrix factorization by alternating least squares is faster than rank-2-truncated SVD (i.e. irlba).
This function is a specialization of rank-2 nmf
with support for factorization of only a subset of samples, and with additional calculations on the factorization model relevant to bipartitioning. See nmf
for details regarding rank-2 factorization.
Value
A list giving the bipartition and useful statistics:
v : vector giving difference between sample loadings between factors in a rank-2 factorization
dist : relative cosine distance of samples within a cluster to centroids of assigned vs. not-assigned cluster
size1 : number of samples in first cluster (positive loadings in 'v')
size2 : number of samples in second cluster (negative loadings in 'v')
samples1: indices of samples in first cluster
samples2: indices of samples in second cluster
center1 : mean feature loadings across samples in first cluster
center2 : mean feature loadings across samples in second cluster
Author(s)
Zach DeBruine
References
Kuang, D, Park, H. (2013). "Fast rank-2 nonnegative matrix factorization for hierarchical document clustering." Proc. 19th ACM SIGKDD intl. conf. on Knowledge discovery and data mining.
See Also
Examples
## Not run:
library(Matrix)
data(iris)
A <- as(as.matrix(iris[,1:4]), "dgCMatrix")
bipartition(A, calc_dist = TRUE)
## End(Not run)