overlapping_sbm {fastRG} | R Documentation |
Create an undirected overlapping degree corrected stochastic blockmodel object
Description
To specify a overlapping stochastic blockmodel, you must specify
the number of nodes (via n
),
the mixing matrix (via k
or B
), and the block
probabilities (optional, via pi
). We provide defaults for most of these
options to enable rapid exploration, or you can invest the effort
for more control over the model parameters. We strongly recommend
setting the expected_degree
or expected_density
argument
to avoid large memory allocations associated with
sampling large, dense graphs.
Usage
overlapping_sbm(
n,
k = NULL,
B = NULL,
...,
pi = rep(1/k, k),
sort_nodes = TRUE,
force_pure = TRUE,
poisson_edges = TRUE,
allow_self_loops = TRUE
)
Arguments
n |
The number of nodes in the overlapping SBM. |
k |
(mixing matrix) The number of blocks in the blockmodel.
Use when you don't want to specify the mixing-matrix by hand.
When |
B |
(mixing matrix) A |
... |
Arguments passed on to
|
pi |
(block probabilities) Probability of membership in each
block. Membership in each block is independent under the
overlapping SBM. Defaults to |
sort_nodes |
Logical indicating whether or not to sort the nodes
so that they are grouped by block. Useful for plotting.
Defaults to |
force_pure |
Logical indicating whether or not to force presence of
"pure nodes" (nodes that belong only to a single community) for the sake
of identifiability. To include pure nodes, block membership sampling
first proceeds as per usual. Then, after it is complete, |
poisson_edges |
Logical indicating whether or not
multiple edges are allowed to form between a pair of
nodes. Defaults to |
allow_self_loops |
Logical indicating whether or not
nodes should be allowed to form edges with themselves.
Defaults to |
Value
An undirected_overlapping_sbm
S3 object, a subclass of the
undirected_factor_model()
with the following additional
fields:
-
pi
: Sampling probabilities for each block. -
sorted
: Logical indicating where nodes are arranged by block (and additionally by degree heterogeneity parameter) within each block.
Generative Model
There are two levels of randomness in a degree-corrected
overlapping stochastic blockmodel. First, for each node, we
independently determine if that node is a member of each block. This is
handled by overlapping_sbm()
. Then, given these block memberships,
we randomly sample edges between nodes. This second
operation is handled by sample_edgelist()
,
sample_sparse()
, sample_igraph()
and
sample_tidygraph()
, depending depending on your desired
graph representation.
Identifiability
In order to be identifiable, an overlapping SBM must satisfy two conditions:
-
B
must be invertible, and the must be at least one "pure node" in each block that belongs to no other blocks.
Block memberships
Note that some nodes may not belong to any blocks.
TODO
Edge formulation
Once we know the block memberships, we need one more
ingredient, which is the baseline intensity of connections
between nodes in block i
and block j
. Then each edge
A_{i,j}
is Poisson distributed with parameter
TODO
References
Kaufmann, Emilie, Thomas Bonald, and Marc Lelarge. "A Spectral Algorithm with Additive Clustering for the Recovery of Overlapping Communities in Networks," Vol. 9925. Lecture Notes in Computer Science. Cham: Springer International Publishing, 2016. https://doi.org/10.1007/978-3-319-46379-7.
Latouche, Pierre, Etienne Birmelé, and Christophe Ambroise. "Overlapping Stochastic Block Models with Application to the French Political Blogosphere." The Annals of Applied Statistics 5, no. 1 (March 2011): 309–36. https://doi.org/10.1214/10-AOAS382.
Zhang, Yuan, Elizaveta Levina, and Ji Zhu. "Detecting Overlapping Communities in Networks Using Spectral Methods." ArXiv:1412.3432, December 10, 2014. http://arxiv.org/abs/1412.3432.
See Also
Other stochastic block models:
dcsbm()
,
directed_dcsbm()
,
mmsbm()
,
planted_partition()
,
sbm()
Other undirected graphs:
chung_lu()
,
dcsbm()
,
erdos_renyi()
,
mmsbm()
,
planted_partition()
,
sbm()
Examples
set.seed(27)
lazy_overlapping_sbm <- overlapping_sbm(n = 1000, k = 5, expected_density = 0.01)
lazy_overlapping_sbm
# sometimes you gotta let the world burn and
# sample a wildly dense graph
dense_lazy_overlapping_sbm <- overlapping_sbm(n = 500, k = 3, expected_density = 0.8)
dense_lazy_overlapping_sbm
k <- 5
n <- 1000
B <- matrix(stats::runif(k * k), nrow = k, ncol = k)
pi <- c(1, 2, 4, 1, 1) / 5
custom_overlapping_sbm <- overlapping_sbm(
n = 200,
B = B,
pi = pi,
expected_degree = 5
)
custom_overlapping_sbm
edgelist <- sample_edgelist(custom_overlapping_sbm)
edgelist
# efficient eigendecompostion that leverages low-rank structure in
# E(A) so that you don't have to form E(A) to find eigenvectors,
# as E(A) is typically dense. computation is
# handled via RSpectra
population_eigs <- eigs_sym(custom_overlapping_sbm)