HCD {HCD} | R Documentation |
hierarchical community detection with recursive spectral methods
Description
Hierarchical community by recursive spectral partitioning. It includes the splitting methods of spectral clustering and sign splitting, as well stopping rules for fixed stopping, non-backtracking matrix checking and edge cross-validation.
Usage
HCD(A, method = "SS", stopping = "NB", reg = FALSE, n.min = 25, D = NULL,notree=TRUE)
Arguments
A |
adjacency matrix. Can be standard R matrix or dsCMatrix (or other type in package Matrix) |
method |
splitting method. "SS" (default) for sign splitting, "SC" for spectral clustering |
stopping |
stopping rule. "NB" (default) for non-backtracking matrix spectrum, "ECV" for edge cross-validation, "Fix"for fixed D layers of partitioning (needs D value) |
reg |
logic value on whether regularization is needed. By default it is FALSE.Set it to be TRUE will add reguarlization, which help the performance on sparse networks, but it will make the computation slower. |
n.min |
integer number. The algorithm will stop splitting if the current size is <= 2*n.min. |
D |
the number of layers to partition, if stopping=="Fix". |
notree |
logical value on whether the tree and the corresponding similarity will be computed. If TRUE (default), will not produce the data.tree object or the community similarity matrix. Only the cluster label and the tree path strings will be returned. This typically makes the runing faster. |
Details
For stopping rules, ECV is nonparametric rank evaluation by cross-validation, a more generally applicable approach without assuming SBM or its variants. ECV is also applicable for weighted networks.So it is believed to be more robust than NB but less effective if the true model is close to BTSBM. However, the ECV is computationally much more intensive.
Notice that the algorithm does not reply on the assumption of the BTSBM. But the estimated probabiilty matrix from the output is based on the BTSBM.
Value
A list of the following objects:
labels |
detected community labels of nodes |
ncl |
number of clusters from the algorithm |
cluster.tree |
a data.tree object for the binary tree between communities |
P |
estimated connection probability matrix between n nodes, according to BTSBM |
node.bin.sim.mat |
binary string similarity between nodes |
comm.bin.sim.mat |
binary string similarity between communities |
tree.path |
a list of strings to describe the path from root to each community along the tree |
Author(s)
Tianxi Li, Lihua Lei, Sharmodeep Bhattacharyya, Purnamrita Sarkar, Peter Bickel, and Elizaveta Levina.
Maintainer: Tianxi Li <tianxili@umn.edu>
References
Li, T., Lei, L., Bhattacharyya, S., Van den Berge, K., Sarkar, P., Bickel, P.J. and Levina, E., 2022. Hierarchical community detection by recursive partitioning. Journal of the American Statistical Association, 117(538), pp.951-968.
Examples
dt <- BTSBM(n=1600,d=4,a.seq=0.2^seq(0,4),lambda=50)
A <- dt$A.list[[1]]
# you can try various versions of the algorithm as below: the Fix is fastest and ECV is slowest.
system.time(HCD.result <- HCD(A,method="SC",stopping="Fix",D=4))