mmsbm {fastRG} | R Documentation |
Create an undirected degree-corrected mixed membership stochastic blockmodel object
Description
To specify a degree-corrected mixed membership stochastic blockmodel, you must specify
the degree-heterogeneity parameters (via n
or theta
),
the mixing matrix (via k
or B
), and the relative block
propensities (optional, via alpha
). We provide defaults for most of these
options to enable rapid exploration, or you can invest the effort
for more control over the model parameters. We strongly recommend
setting the expected_degree
or expected_density
argument
to avoid large memory allocations associated with
sampling large, dense graphs.
Usage
mmsbm(
n = NULL,
theta = NULL,
k = NULL,
B = NULL,
...,
alpha = rep(1, k),
sort_nodes = TRUE,
force_pure = TRUE,
poisson_edges = TRUE,
allow_self_loops = TRUE
)
Arguments
n |
(degree heterogeneity) The number of nodes in the blockmodel.
Use when you don't want to specify the degree-heterogeneity
parameters |
theta |
(degree heterogeneity) A numeric vector
explicitly specifying the degree heterogeneity
parameters. This implicitly determines the number of nodes
in the resulting graph, i.e. it will have |
k |
(mixing matrix) The number of blocks in the blockmodel.
Use when you don't want to specify the mixing-matrix by hand.
When |
B |
(mixing matrix) A |
... |
Arguments passed on to
|
alpha |
(relative block propensities) Relative block
propensities, which are parameters of a Dirichlet distribution.
All elments of |
sort_nodes |
Logical indicating whether or not to sort the nodes
so that they are grouped by block and by |
force_pure |
Logical indicating whether or not to force presence of
"pure nodes" (nodes that belong only to a single community) for the sake
of identifiability. To include pure nodes, block membership sampling
first proceeds as per usual. Then, after it is complete, |
poisson_edges |
Logical indicating whether or not
multiple edges are allowed to form between a pair of
nodes. Defaults to |
allow_self_loops |
Logical indicating whether or not
nodes should be allowed to form edges with themselves.
Defaults to |
Value
An undirected_mmsbm
S3 object, a subclass of the
undirected_factor_model()
with the following additional
fields:
-
theta
: A numeric vector of degree-heterogeneity parameters. -
Z
: The community memberships of each node, amatrix()
withk
columns, whose row sums all equal one. -
alpha
: Community membership proportion propensities. -
sorted
: Logical indicating where nodes are arranged by block (and additionally by degree heterogeneity parameter) within each block.
Generative Model
There are two levels of randomness in a degree-corrected
stochastic blockmodel. First, we randomly choose how much
each node belongs to each block in the blockmodel. Each node
is one unit of block membership to distribute. This is
handled by mmsbm()
. Then, given these block memberships,
we randomly sample edges between nodes. This second
operation is handled by sample_edgelist()
,
sample_sparse()
, sample_igraph()
and
sample_tidygraph()
, depending depending on your desired
graph representation.
Block memberships
Let Z_i
by a vector on the k
dimensional simplex
representing the block memberships of node i
.
To generate z_i
we sample from a Dirichlet
distribution with parameter vector \alpha
.
Block memberships for each node are independent.
Degree heterogeneity
In addition to block membership, the MMSBM also allows
nodes to have different propensities for edge formation.
We represent this propensity for node i
by a positive
number \theta_i
.
Edge formulation
Once we know the block membership vector z_i, z_j
and the degree
heterogeneity parameters \theta
, we need one more
ingredient, which is the baseline intensity of connections
between nodes in block i
and block j
. This is given by a
k \times k
matrix B
. Then each edge
A_{i,j}
is Poisson distributed with parameter
\lambda_{i, j} = \theta_i \cdot z_i^T B z_j \cdot \theta_j.
See Also
Other stochastic block models:
dcsbm()
,
directed_dcsbm()
,
overlapping_sbm()
,
planted_partition()
,
sbm()
Other undirected graphs:
chung_lu()
,
dcsbm()
,
erdos_renyi()
,
overlapping_sbm()
,
planted_partition()
,
sbm()
Examples
set.seed(27)
lazy_mmsbm <- mmsbm(n = 1000, k = 5, expected_density = 0.01)
lazy_mmsbm
# sometimes you gotta let the world burn and
# sample a wildly dense graph
dense_lazy_mmsbm <- mmsbm(n = 500, k = 3, expected_density = 0.8)
dense_lazy_mmsbm
# explicitly setting the degree heterogeneity parameter,
# mixing matrix, and relative community sizes rather
# than using randomly generated defaults
k <- 5
n <- 1000
B <- matrix(stats::runif(k * k), nrow = k, ncol = k)
theta <- round(stats::rlnorm(n, 2))
alpha <- c(1, 2, 4, 1, 1)
custom_mmsbm <- mmsbm(
theta = theta,
B = B,
alpha = alpha,
expected_degree = 50
)
custom_mmsbm
edgelist <- sample_edgelist(custom_mmsbm)
edgelist
# efficient eigendecompostion that leverages low-rank structure in
# E(A) so that you don't have to form E(A) to find eigenvectors,
# as E(A) is typically dense. computation is
# handled via RSpectra
population_eigs <- eigs_sym(custom_mmsbm)
svds(custom_mmsbm)$d