R: Estimate a TERGM by MPLE with temporal bootstrapping

btergm {btergm}

R Documentation

Estimate a TERGM by MPLE with temporal bootstrapping

Description

Estimate a TERGM by MPLE with temporal bootstrapping.

Usage

btergm(
  formula,
  R = 500,
  offset = FALSE,
  returndata = FALSE,
  parallel = c("no", "multicore", "snow"),
  ncpus = 1,
  cl = NULL,
  control.ergm = NULL,
  usefastglm = FALSE,
  verbose = TRUE,
  ...
)

Arguments

`formula`	Formula for the TERGM. Model construction works like in the ergm package with the same model terms etc. (for a list of terms, see `help("ergm-terms")`). The networks to be modeled on the left-hand side of the equation must be given either as a list of network objects with more recent networks last (i.e., chronological order) or as a list of matrices with more recent matrices at the end. `dyadcov` and `edgecov` terms accept time-independent covariates (as `network` or `matrix` objects) or time-varying covariates (as a list of networks or matrices with the same length as the list of networks to be modeled).
`R`	Number of bootstrap replications. The higher the number of replications, the more accurate but also the slower is the estimation.
`offset`	If `offset = TRUE` is set, a list of offset matrices (one for each time step) with structural zeros is handed over to the pseudolikelihood preparation routine. The offset matrices contain structural zeros where either the dependent networks or any of the covariates have missing nodes (if `auto.adjust = TRUE` is used). All matrices and network objects are inflated to the dimensions of the largest object, and the offset matrices inform the estimation preparation routine which dyads are constrained to be absent. After MPLE data preparation, the dyads with these structural zeros are removed before the GLM is estimated. If `offset = FALSE` is set (the default behavior), all nodes that are not present across all covariates and networks within a time step are removed completely from the respective object(s) before estimation begins.
`returndata`	Return the processed input data instead of estimating and returning the model? In the `btergm` case, this will return a data frame with the dyads of the dependent variable/network and the change statistics for all covariates. In the `mtergm` case, this will return a list object with the blockdiagonal network object for the dependent variable and blockdiagonal matrices for all dyadic covariates and the offset matrix for the structural zeros.
`parallel`	Use multiple cores in a computer or nodes in a cluster to speed up bootstrapping computations. The default value `"no"` means parallel computing is switched off. If `"multicore"` is used, the `mclapply` function from the parallel package (formerly in the multicore package) is used for parallelization. This should run on any kind of system except MS Windows because it is based on forking. It is usually the fastest type of parallelization. If `"snow"` is used, the `parLapply` function from the parallel package (formerly in the snow package) is used for parallelization. This should run on any kind of system including cluster systems and including MS Windows. It is slightly slower than the former alternative if the same number of cores is used. However, `"snow"` provides support for MPI clusters with a large amount of cores, which multicore does not offer (see also the `cl` argument). The backend for the bootstrapping procedure is the boot package.
`ncpus`	The number of CPU cores used for parallel computing (only if `parallel` is activated). If the number of cores should be detected automatically on the machine where the code is executed, one can set `ncpus = detectCores()` after loading the parallel package. On some HPC clusters, the number of available cores is saved as an environment variable; for example, if MOAB is used, the number of available cores can sometimes be accessed using `Sys.getenv("MOAB_PROCCOUNT")`, depending on the implementation.
`cl`	An optional parallel or snow cluster for use if `parallel = "snow"`. If not supplied, a PSOCK cluster is created temporarily on the local machine.
`control.ergm`	ergm controls for `ergmMPLE` calls. See `control.ergm` for details.
`usefastglm`	Controls whether to use the `fastglm` estimation routine from the fastglm package with `method = 3`. Defaults to `FALSE` (and then uses `speedglm.wfit` instead if available).
`verbose`	Print details about data preprocessing and estimation settings.
`...`	Further arguments to be handed over to the `boot` function.

Details

The btergm function computes temporal exponential random graph models (TERGM) by bootstrapped pseudolikelihood, as described in Desmarais and Cranmer (2012). It is faster than MCMC-MLE but only asymptotically unbiased the longer the time series of networks because it uses temporal bootstrapping to correct the standard errors.

Author(s)

Philip Leifeld, Skyler J. Cranmer, Bruce A. Desmarais

References

Cranmer, Skyler J., Tobias Heinrich and Bruce A. Desmarais (2014): Reciprocity and the Structural Determinants of the International Sanctions Network. Social Networks 36(1): 5-22. doi:10.1016/j.socnet.2013.01.001.

Desmarais, Bruce A. and Skyler J. Cranmer (2012): Statistical Mechanics of Networks: Estimation and Uncertainty. Physica A 391: 1865–1876. doi:10.1016/j.physa.2011.10.018.

Desmarais, Bruce A. and Skyler J. Cranmer (2010): Consistent Confidence Intervals for Maximum Pseudolikelihood Estimators. Neural Information Processing Systems 2010 Workshop on Computational Social Science and the Wisdom of Crowds.

Leifeld, Philip, Skyler J. Cranmer and Bruce A. Desmarais (2017): Temporal Exponential Random Graph Models with btergm: Estimation and Bootstrap Confidence Intervals. Journal of Statistical Software 83(6): 1-36. doi:10.18637/jss.v083.i06.

Examples

set.seed(5)

networks <- list()
for (i in 1:10) {              # create 10 random networks with 10 actors
  mat <- matrix(rbinom(100, 1, .25), nrow = 10, ncol = 10)
  diag(mat) <- 0               # loops are excluded
  nw <- network::network(mat)  # create network object
  networks[[i]] <- nw          # add network to the list
}

covariates <- list()
for (i in 1:10) {              # create 10 matrices as covariate
  mat <- matrix(rnorm(100), nrow = 10, ncol = 10)
  covariates[[i]] <- mat       # add matrix to the list
}

fit <- btergm(networks ~ edges + istar(2) + edgecov(covariates), R = 100)
summary(fit)                   # show estimation results

# For examples with real data, see help("knecht") or help("alliances").


# Examples for parallel processing:

# Some preliminaries:
# - "Forking" means running the code on multiple cores in the same
#   computer. It's fast but consumes a lot of memory because all
#   objects are copied for each node. It's also restricted to
#   cores within a physical computer, i.e. no distribution over a
#   network or cluster. Forking does not work on Windows systems.
# - "MPI" is a protocol for distributing computations over many
#   cores, often across multiple physical computers/nodes. MPI
#   is fast and can distribute the work across hundreds of nodes
#   (but remember that R can handle a maximum of 128 connections,
#   which includes file access and parallel connections). However,
#   it requires that the Rmpi package is installed and that an MPI
#   server is running (e.g., OpenMPI).
# - "PSOCK" is a TCP-based protocol. It can also distribute the
#   work to many cores across nodes (like MPI). The advantage of
#   PSOCK is that it can as well make use of multiple nodes within
#   the same node or desktop computer (as with forking) but without
#   consuming too much additional memory. However, the drawback is
#   that it is not as fast as MPI or forking.
# The following code provides examples for these three scenarios.

# btergm works with clusters via the parallel package. That is, the
# user can create a cluster object (of type "PSOCK", "MPI", or
# "FORK") and supply it to the 'cl' argument of the 'btergm'
# function. If no cluster object is provided, btergm will try to
# create a temporary PSOCK cluster (if parallel = "snow") or it
# will use forking (if parallel = "multicore").

## Not run: 
# To use a PSOCK cluster without providing an explicit cluster
# object:
require("parallel")
fit <- btergm(networks ~ edges + istar(2) + edgecov(covariates),
              R = 100, parallel = "snow", ncpus = 25)

# Equivalently, a PSOCK cluster can be provided as follows:
require("parallel")
cores <- 25
cl <- makeCluster(cores, type = "PSOCK")
fit <- btergm(networks ~ edges + istar(2) + edgecov(covariates),
              R = 100, parallel = "snow", ncpus = cores, cl = cl)
stopCluster(cl)

# Forking (without supplying a cluster object) can be used as
# follows.
require("parallel")
cores <- 25
fit <- btergm(networks ~ edges + istar(2) + edgecov(covariates),
              R = 100, parallel = "multicore", ncpus = cores)
stopCluster(cl)

# Forking (by providing a cluster object) works as follows:
require("parallel")
cores <- 25
cl <- makeCluster(cores, type = "FORK")
fit <- btergm(networks ~ edges + istar(2) + edgecov(covariates),
              R = 100, parallel = "snow", ncpus = cores, cl = cl)
stopCluster(cl)

# To use MPI, a cluster object MUST be created beforehand. In
# this example, a MOAB HPC server is used. It stores the number of
# available cores as a system option:
require("parallel")
cores <- as.numeric(Sys.getenv("MOAB_PROCCOUNT"))
cl <- makeCluster(cores, type = "MPI")
fit <- btergm(networks ~ edges + istar(2) + edgecov(covariates),
              R = 100, parallel = "snow", ncpus = cores, cl = cl)
stopCluster(cl)

# In the following example, the Rmpi package is used to create a
# cluster. This may not work on all systems; consult your local
# support staff or the help files on your HPC server to find out how
# to create a cluster object on your system.

# snow/Rmpi start-up
if (!is.loaded("mpi_initialize")) {
  library("Rmpi")
}
library(snow);

mpirank <- mpi.comm.rank (0)
if (mpirank == 0) {
  invisible(makeMPIcluster())
} else {
  sink (file="/dev/null")
  invisible(slaveLoop (makeMPImaster()))
  mpi.finalize()
  q()
}
# End snow/Rmpi start-up

cl <- getMPIcluster()

fit <- btergm(networks ~ edges + istar(2) + edgecov(covariates),
              R = 100, parallel = "snow", ncpus = 25, cl = cl)

## End(Not run)

[Package btergm version 1.10.12 Index]