bigergm {bigergm} | R Documentation |
bigergm: Exponential-family random graph models for large networks with local dependence
Description
The function bigergm
estimates and simulates three classes of exponential-family
random graph models for large networks under local dependence:
The p_1 model of Holland and Leinhardt (1981) in exponential-family form and extensions by Vu, Hunter, and Schweinberger (2013), Schweinberger, Petrescu-Prahova, and Vu (2014), Dahbura et al. (2021), and Fritz et al. (2024) to both directed and undirected random graphs with additional model terms, with and without covariates.
The stochastic block model of Snijders and Nowicki (1997) and Nowicki and Snijders (2001) in exponential-family form.
The exponential-family random graph models with local dependence of Schweinberger and Handcock (2015), with and without covariates. The exponential-family random graph models with local dependence replace the long-range dependence of conventional exponential-family random graph models by short-range dependence. Therefore, exponential-family random graph models with local dependence replace the strong dependence of conventional exponential-family random graph models by weak dependence, reducing the problem of model degeneracy (Handcock, 2003; Schweinberger, 2011) and improving goodness-of-fit (Schweinberger and Handcock, 2015). In addition, exponential-family random graph models with local dependence satisfy a weak form of self-consistency in the sense that these models are self-consistent under neighborhood sampling (Schweinberger and Handcock, 2015), which enables consistent estimation of neighborhood-dependent parameters (Schweinberger and Stewart, 2017; Schweinberger, 2017).
Usage
bigergm(
object,
add_intercepts = FALSE,
n_blocks = NULL,
n_cores = 1,
blocks = NULL,
estimate_parameters = TRUE,
verbose = 0,
n_MM_step_max = 100,
tol_MM_step = 1e-04,
initialization = "infomap",
use_infomap_python = FALSE,
virtualenv_python = "r-bigergm",
seed_infomap = NULL,
weight_for_initialization = 1000,
seed = NULL,
method_within = "MPLE",
control_within = ergm::control.ergm(),
clustering_with_features = TRUE,
compute_pi = FALSE,
check_alpha_update = FALSE,
check_blocks = FALSE,
cache = NULL,
return_checkpoint = TRUE,
only_use_preprocessed = FALSE,
...
)
Arguments
object |
An R |
add_intercepts |
Boolean value to indicate whether adequate intercepts should be added to the provided formula so that the model in the first stage of the estimation is a nested model of the estimated model in the second stage of the estimation. |
n_blocks |
The number of blocks. This must be specified by the user.
When you pass a |
n_cores |
The number of CPU cores to use. |
blocks |
The pre-specified block memberships for each node.
If |
estimate_parameters |
If |
verbose |
A logical or an integer: if this is TRUE/1, the program will print out additional information about the progress of estimation and simulation. A higher value yields lower level information. |
n_MM_step_max |
The maximum number of MM iterations.
Currently, no early stopping criteria is introduced. Thus |
tol_MM_step |
Tolerance regarding the relative change of the lower bound of the likelihood used to decide on the convergence of the clustering step |
initialization |
How the blocks should be initialized.
If |
use_infomap_python |
If |
virtualenv_python |
Which virtual environment should be used for the infomap algorithm? |
seed_infomap |
seed value (integer) for the infomap algorithm, which can be used to initialize the estimation of the blocks. |
weight_for_initialization |
weight value used for cluster initialization. The higher this value, the more weight is put on the initialized block allocation. |
seed |
seed value (integer) for the random number generator. |
method_within |
If "MPLE" (the default), then the maximum pseudolikelihood estimator is implemented when estimating the within-block network model. If "MLE", then an approximate maximum likelihood estimator is conducted. If "CD" (EXPERIMENTAL), the Monte-Carlo contrastive divergence estimate is returned. |
control_within |
A list of control parameters for the |
clustering_with_features |
If |
compute_pi |
If |
check_alpha_update |
If |
check_blocks |
If TRUE, this function keeps track of estimated block memberships at each MM iteration. |
cache |
a |
return_checkpoint |
If |
only_use_preprocessed |
If |
... |
Additional arguments, to be passed to lower-level functions (mainly to the |
Value
An object of class 'bigergm' including the results of the fitted model. These include:
- call:
call of the mode
- block:
vector of the found block of the nodes into cluster
- initial_block:
vector of the initial block of the nodes into cluster
- sbm_pi:
Connection probabilities represented as a
n_blocks x n_blocks
matrix from the first stage of the estimation between all clusters- MM_list_z:
list of cluster allocation for each node and each iteration
- MM_list_alpha:
list of posterior distributions of cluster allocations for all nodes for each iteration
- MM_change_in_alpha:
change in 'alpha' for each iteration
- MM_lower_bound:
vector of the evidence lower bounds from the MM algorithm
- alpha:
matrix representing the converged posterior distributions of cluster allocations for all nodes
- counter_e_step:
integer number indicating the number of iterations carried out
- adjacency_matrix:
sparse matrix representing the adjacency matrix used for the estimation
- estimation_status:
character stating the status of the estimation
- est_within:
ergm
object of the model for within cluster connections- est_between:
ergm
object of the model for between cluster connections- checkpoint:
list of information to continue the estimation (only returned if
return_checkpoint = TRUE
)- membership_before_kmeans:
vector of the found blocks of the nodes into cluster before the final check for bad clusters
- estimate_parameters:
binary value if the parameters in the second step of the algorithm should be estimated or not
References
Babkin, S., Stewart, J., Long, X., and M. Schweinberger (2020). Large-scale estimation of random graph models with local dependence. Computational Statistics and Data Analysis, 152, 1–19.
Dahbura, J. N. M., Komatsu, S., Nishida, T. and Mele, A. (2021), ‘A structural model of business cards exchange networks’. https://arxiv.org/abs/2105.12704
Fritz C., Georg C., Mele A., and Schweinberger M. (2024). A strategic model of software dependency networks. https://arxiv.org/abs/2402.13375
Handcock, M. S. (2003). Assessing degeneracy in statistical models of social networks. Technical report, Center for Statistics and the Social Sciences, University of Washington, Seattle.
https://csss.uw.edu/Papers/wp39.pdf
Holland, P. W. and S. Leinhardt (1981). An exponential family of probability distributions for directed graphs. Journal of the American Statistical Association, Theory & Methods, 76, 33–65.
Morris M, Handcock MS, Hunter DR (2008). Specification of Exponential-Family Random Graph Models: Terms and Computational Aspects. Journal of Statistical Software, 24.
Nowicki, K. and T. A. B. Snijders (2001). Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, Theory & Methods, 96, 1077–1087.
Schweinberger, M. (2011). Instability, sensitivity, and degeneracy of discrete exponential families. Journal of the American Statistical Association, Theory & Methods, 106, 1361–1370.
Schweinberger, M. (2020). Consistent structure estimation of exponential-family random graph models with block structure. Bernoulli, 26, 1205–1233.
Schweinberger, M. and M. S. Handcock (2015). Local dependence in random graph models: characterization, properties, and statistical inference. Journal of the Royal Statistical Society, Series B (Statistical Methodology), 7, 647-676.
Schweinberger, M., Krivitsky, P. N., Butts, C.T. and J. Stewart (2020). Exponential-family models of random graphs: Inference in finite, super, and infinite population scenarios. Statistical Science, 35, 627-662.
Schweinberger, M. and P. Luna (2018). HERGM: Hierarchical exponential-family random graph models. Journal of Statistical Software, 85, 1–39.
Schweinberger, M., Petrescu-Prahova, M. and D. Q. Vu (2014). Disaster response on September 11, 2001 through the lens of statistical network analysis. Social Networks, 37, 42–55.
Schweinberger, M. and J. Stewart (2020). Concentration and consistency results for canonical and curved exponential-family random graphs. The Annals of Statistics, 48, 374–396.
Snijders, T. A. B. and K. Nowicki (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure. Journal of Classification, 14, 75–100.
Stewart, J., Schweinberger, M., Bojanowski, M., and M. Morris (2019). Multilevel network data facilitate statistical inference for curved ERGMs with geometrically weighted terms. Social Networks, 59, 98–119.
Vu, D. Q., Hunter, D. R. and M. Schweinberger (2013). Model-based clustering of large networks. Annals of Applied Statistics, 7, 1010–1039.
Examples
# Load an embedded network object.
data(toyNet)
# Specify the model that you would like to estimate.
model_formula <- toyNet ~ edges + nodematch("x") + nodematch("y") + triangle
# Estimate the model
bigergm_res <- bigergm(
object = model_formula,
# The model you would like to estimate
n_blocks = 4,
# The number of blocks
n_MM_step_max = 10,
# The maximum number of MM algorithm steps
estimate_parameters = TRUE,
# Perform parameter estimation after the block recovery step
clustering_with_features = TRUE,
# Indicate that clustering must take into account nodematch on characteristics
check_blocks = FALSE)
# Example with N() operator
## Not run:
set.seed(1)
# Prepare ingredients for simulating a network
N <- 500
K <- 10
list_within_params <- c(1, 2, 2,-0.5)
list_between_params <- c(-8, 0.5, -0.5)
formula <- g ~ edges + nodematch("x") + nodematch("y") + N(~edges,~log(n)-1)
memb <- sample(1:K,prob = c(0.1,0.2,0.05,0.05,0.10,0.1,0.1,0.1,0.1,0.1),
size = N, replace = TRUE)
vertex_id <- as.character(11:(11 + N - 1))
x <- sample(1:2, size = N, replace = TRUE)
y <- sample(1:2, size = N, replace = TRUE)
df <- tibble::tibble(
id = vertex_id,
memb = memb,
x = x,
y = y
)
g <- network::network.initialize(n = N, directed = FALSE)
g %v% "vertex.names" <- df$id
g %v% "block" <- df$memb
g %v% "x" <- df$x
g %v% "y" <- df$y
# Simulate a network
g_sim <-
simulate_bigergm(
formula = formula,
coef_within = list_within_params,
coef_between = list_between_params,
nsim = 1,
control_within = control.simulate.formula(MCMC.burnin = 200000))
estimation <- bigergm(update(formula,new = g_sim~.), n_blocks = 10,
verbose = T)
summary(estimation)
## End(Not run)