NAC {NAC}R Documentation

Spectral Clustering on Network-Adjusted Covariates.

Description

Using network-adjusted covariates to detect underlying communities.

Usage

NAC(Adj, Covariate, K, alpha = NULL, beta = 0, itermax = 100, startn = 10)

Arguments

Adj

An n \times n symmetric adjacency matrix with diagonals being 0 and positive entries being 1.

Covariate

An n\times p covariate matrix. The rows correspond to nodes and the columns correspond to covariates.

K

A positive integer which is no larger than n. This is the predefined number of communities.

alpha

An optional numeric vector to tune the weight of covariate matrix. The default value is \dfrac{\bar{d}/2}{d_i/\text{log}n+1}, where d_i is the degree of node i and \bar{d} is the average degree.

beta

An optional parameter used when the covariate matrix X is uninformative. By default, \beta is set as 0 assuming X carries meaningful information. Otherwise, users can manually specify a positive value to weigh network information.

itermax

k-means parameter, indicating the maximum number of iterations allowed. The default value is 100.

startn

k-means parameter. The number of times the algorithm should be run with different initial centroids. The default value is 10.

Details

Spectral Clustering Network-Adjusted Covariates (NAC) is fully established in Network-Adjusted Covariates for Community Detection of Hu & Wang (2023). This method is particularly effective in the analysis of multiscale networks with covariates, addressing the challenge of misspecification between networks and covariates. NAC relies on the construction of network-adjusted covariate vectors y_i = \alpha_ix_i + \sum_{j: A_{ij}=1}x_j, i \in 1, \cdots, n, where the first part has the nodal covariate information and the second part conveys network information. By constructing Y = (y_1, \cdots, y_n)^{\prime} = AX + D_{\alpha}X where A is the adjacency matrix, X is the covariate matrix, and D_{\alpha} is the diagonal matrix with diagonals as \alpha_1, \cdots, \alpha_n, NAC applies K-means on the first K normalized left singular vectors, treating each row as a data point. A notable feature of NAC is its tuning-free nature, where node-specific coefficient \alpha_i is computed given the i-th node's degree. NAC allows for user-specified \alpha_i as well. A generalization with uninformative covariates is considered by adjusting parameter \beta. As long as the covariates do provide information, the specification of \beta can be ignored.

Value

estall

A factor indicating nodes' labels. Items sharing the same label are in the same community.

References

Hu, Y., & Wang, W. (2023). Network-Adjusted Covariates for Community Detection. arXiv preprint arXiv:2306.15616.
https://arxiv.org/abs/2306.15616

Examples


# Simulate the Network
n = 10; K = 2; p =5; prob1 = 0.9;
theta = 0.4 + (0.45-0.05)*(seq(1:n)/n)^2; Theta = diag(theta);
P  = matrix(c(0.8, 0.2, 0.2, 0.8), byrow = TRUE, nrow = K)
set.seed(2022)
l = sample(1:K, n, replace=TRUE); # node labels
Pi = matrix(0, n, K) # label matrix
for (k in 1:K){
  Pi[l == k, k] = 1
}
Omega = Theta %*% Pi %*% P %*% t(Pi) %*% Theta;
Adj = matrix(runif(n*n, 0, 1), nrow = n);
Adj = Omega - Adj;
Adj = 1*(Adj >= 0)
diag(Adj) = 0
Adj[lower.tri(Adj)] = t(Adj)[lower.tri(Adj)]
Q = 0.1*matrix(sign(runif(p*K) - 0.5), nrow = p);
for(i in 1:K){
  Q[(i-1)*(p/K)+(1:(p/K)), i] = 0.3; #remark. has a change here
}
W = matrix(0, nrow = n, ncol = K);
for(jj in 1:n) {
  pp = rep(1/(K-1), K); pp[l[jj]] = 0;
  if(runif(1) <= prob1) {W[jj, 1:K] = Pi[jj, ];}
  else
  W[jj, sample(K, 1, prob = pp)] = 1;
  }
W = t(W)
D0 = Q %*% W
D = matrix(0, n, p)
for (i in 1:n){
  D[i,] = rnorm(p, mean = D0[,i], sd = 1);
}
NAC(Adj, D, 2)

[Package NAC version 0.1.0 Index]