cpThreshold {CliquePercolation} | R Documentation |

`k`

And `I`

For Clique Percolation Community DetectionFunction for determining threshold value(s) (ratio of largest to second largest
community sizes, chi, entropy) of ranges of `k`

and `I`

values to help deciding
for optimal `k`

and `I`

values.

cpThreshold( W, method = c("unweighted", "weighted", "weighted.CFinder"), k.range, I.range, threshold = c("largest.components.ratio", "chi", "entropy") )

`W` |
A qgraph object or a symmetric matrix; see also qgraph |

`method` |
A string indicating the method to use
( |

`k.range` |
integer or vector of |

`I.range` |
integer or vector of |

`threshold` |
A string or vector indicating which threshold(s) to determine
( |

Optimizing `k`

(clique size) and `I`

(Intensity threshold) in clique percolation
community detection is a difficult task. Farkas et al. (2007) recommend to look at the
ratio of the largest to second largest community sizes
(`threshold = "largest.components.ratio"`

) for very large networks or
the variance of the community sizes when removing the community size of the largest
community (`threshold = "chi"`

) for somewhat smaller networks. These thresholds were
derived from percolation theory. If `I`

for a certain `k`

is too high, no
community will be identified. If `I`

is too low, a giant community with all nodes
emerges. Just above this `I`

, the distribution of community sizes often follows a
power law, which constitutes a broad community sizes distribution. Farkas et al. (2007)
point out, that for such `I`

, the ratio of the largest to second largest community
sizes is approximately 2, constituting one way to optimize `I`

for each possible
`k`

. For somewhat smaller networks, the ratio can be rather unstable. Instead,
Farkas et al. (2007, p.8) propose to look at the variance of the community sizes after
removing the largest community. The idea is that when `I`

is rather low, one giant
community and multiple equally small ones occur. Then, the variance of the community
sizes of the small communities (removing the giant community) is low. When `I`

is high, only a few equally small communities will occur. Then, the variance of the
community sizes (after removing the largest community) will also be low. In between,
the variance will at some point be maximal, namely when the community size
distribution is maximally broad (power law-distributed). Thus, the maximal variance
could be used to optimize `I`

for various `k`

.

For very small networks, optimizing `k`

and `I`

based on the distribution of the
community sizes will be impossible, as too few communities will occur. Another possible
threshold for such networks is based on the entropy of the community sizes
(`threshold = "entropy"`

). Entropy can be interpreted as an indicator of how
surprising the respective solution is. The formula used here is based on Shannon
Information, namely

*-∑_{i=1}^N p_i * \log_2 p_i*

with *p_i* being the probability that a node is part of community *i*. For instance,
if there are two communities, one of size 5 and one of size 3, the result would be

*-((5/8 * \log_2 5/8) + (3/8 * \log_2 3/8)) = 1.46*

When calculating entropy, the isolated nodes identified by clique percolation are treated as
a separate community. If there is only one community or only isolated nodes, entropy is
zero, indicating that the surprisingness is low. As compared to the ratio and chi
thresholds, entropy favors communities that are equal in size. Thus, it should not be
used for larger networks for which a broader community size distribution is preferred.
Note that the entropy threshold has not been validated for clique percolation as of now.
Initial simulation studies indicate that it consistently detects surprising community
partitions in smaller networks especially if there are cliques of larger `k`

.

Ratio thresholds can be determined only if there are at least two communities. Chi threshold can be determined only if there are at least three communities. If there are not enough communities for the respective threshold, their values are NA in the data frame. Entropy can always be determined.

A data frame with columns for `k`

, `I`

(if `method = "weighted"`

or `method = "weighted.CFinder"`

), number of communities, number of isolated
nodes, and results of the specified threshold(s).

Jens Lange, lange.jens@outlook.com

Farkas, I., Abel, D., Palla, G., & Vicsek, T. (2007). Weighted network modules.
*New Journal of Physics, 9*, 180-180. http://doi.org/10.1088/1367-2630/9/6/180

## Example for unweighted networks # create qgraph object W <- matrix(c(0,1,1,1,0,0,0,0, 0,0,1,1,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,1,1,1,0, 0,0,0,0,0,1,1,0, 0,0,0,0,0,0,1,0, 0,0,0,0,0,0,0,1, 0,0,0,0,0,0,0,0), nrow = 8, ncol = 8, byrow = TRUE) W <- Matrix::forceSymmetric(W) W <- qgraph::qgraph(W) # determine entropy threshold for k = 3 and k = 4 results <- cpThreshold(W = W, method = "unweighted", k.range = c(3,4), threshold = "entropy") ## Example for weighted networks; three large communities with I = 0.3, 0.2, and 0.1, respectively # create qgraph object W <- matrix(c(0,0.10,0,0,0,0,0.10,0.10,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0.10,0,0,0,0,0.10,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0.10,0,0,0,0.10,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0.10,0,0,0.10,0.20,0,0,0,0,0.20,0.20,0,0,0,0,0,0,0, 0,0,0,0,0,0.10,0,0.10,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0.10,0.10,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0.10,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0.20,0,0,0,0,0.20,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0.20,0,0,0,0.20,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0.20,0,0,0.20,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0.20,0,0.20,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0.20,0.20,0.30,0,0,0,0,0.30,0.30, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.20,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.30,0,0,0,0,0.30, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.30,0,0,0,0.30, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.30,0,0,0.30, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.30,0,0.30, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.30,0.30, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.30, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), nrow = 22, ncol = 22, byrow = TRUE) W <- Matrix::forceSymmetric(W) W <- qgraph::qgraph(W, layout = "spring", edge.labels = TRUE) # determine ratio, chi, and entropy thresholds for k = 3 and I from 0.3 to 0.09 results <- cpThreshold(W = W, method = "weighted", k.range = 3, I.range = c(seq(0.3, 0.09, by = -0.01)), threshold = c("largest.components.ratio","chi","entropy")) ## Example with Obama data set (see ?Obama) # get data data(Obama) # estimate network net <- qgraph::EBICglasso(qgraph::cor_auto(Obama), n = nrow(Obama)) # determine entropy threshold for k from 3 to 4 and I from 0.1 to 0.5 threshold <- cpThreshold(net, method = "weighted", k.range = 3:4, I.range = seq(0.1, 0.5, 0.01), threshold = "entropy")

[Package *CliquePercolation* version 0.3.0 Index]