CAclustering {NAC}R Documentation

Covariate Assisted Spectral Clustering.

Description

CAclustering clusters graph nodes by applying spectral clustering with the assistance from node covariates.

Usage

CAclustering(Adj, Covariate, K, alphan = 5, itermax = 100, startn = 10)

Arguments

Adj

An n \times n symmetric adjacency matrix with diagonals being 0 and positive entries being 1.

Covariate

An n \times p covariate matrix. The rows correspond to nodes and the columns correspond to covariates.

K

A positive integer which is no larger than n. This is the predefined number of communities.

alphan

The number of candidate \alpha's to try within the range (\alpha_{min}, \alpha_{max}) given in Binkiewicz et al. (2017). An optimal \alpha is expected to achieve a balance between L_{\tau} and X.

itermax

k-means parameter, indicating the maximum number of iterations allowed. The default value is 100.

startn

k-means parameter. The number of times the algorithm should be run with different initial centroids. The default value is 10.

Details

CAclustering is an algorithm designed for community detection in networks with node covariates, as introduced in the paper Covariate-assisted spectral clustering of Binkiewicz et al. (2017). CAclustering applies k-means on the first K leading eigenvectors of L_{\tau}+\alpha XX^{\prime}, where L_{\tau} is the regularized graph Laplacian, X is the covariates matrix, and \alpha is a tuning parameter.

Value

estall

A factor indicating nodes' labels. Items sharing the same label are in the same community.

References

Binkiewicz, N., Vogelstein, J. T., & Rohe, K. (2017). Covariate-assisted spectral clustering. Biometrika, 104(2), 361-377.
doi:10.1093/biomet/asx008

Examples


# Simulate the Network
n = 10; K = 2; p =5; prob1 = 0.9;
theta = 0.4 + (0.45-0.05)*(seq(1:n)/n)^2; Theta = diag(theta);
P  = matrix(c(0.8, 0.2, 0.2, 0.8), byrow = TRUE, nrow = K)
set.seed(2022)
l = sample(1:K, n, replace=TRUE); # node labels
Pi = matrix(0, n, K) # label matrix
for (k in 1:K){
  Pi[l == k, k] = 1
}
Omega = Theta %*% Pi %*% P %*% t(Pi) %*% Theta;
Adj = matrix(runif(n*n, 0, 1), nrow = n);
Adj = Omega - Adj;
Adj = 1*(Adj >= 0)
diag(Adj) = 0
Adj[lower.tri(Adj)] = t(Adj)[lower.tri(Adj)]
Q = 0.1*matrix(sign(runif(p*K) - 0.5), nrow = p);
for(i in 1:K){
  Q[(i-1)*(p/K)+(1:(p/K)), i] = 0.3; #remark. has a change here
}
W = matrix(0, nrow = n, ncol = K);
for(jj in 1:n) {
  pp = rep(1/(K-1), K); pp[l[jj]] = 0;
  if(runif(1) <= prob1) {W[jj, 1:K] = Pi[jj, ];}
  else
  W[jj, sample(K, 1, prob = pp)] = 1;
  }
W = t(W)
D0 = Q %*% W
D = matrix(0, n, p)
for (i in 1:n){
  D[i,] = rnorm(p, mean = D0[,i], sd = 1);
}
CAclustering(Adj, D, 2)

[Package NAC version 0.1.0 Index]