ergm-parallel {ergm}R Documentation

Parallel Processing in the ergm Package

Description

Using clusters multiple CPUs or CPU cores to speed up ERGM estimation and simulation.

The ergm.getCluster function is usually called internally by the ergm process (in ergm_MCMC_sample) and will attempt to start the appropriate type of cluster indicated by the control.ergm settings. It will also check that the same version of ergm is installed on each node.

The ergm.stopCluster shuts down a cluster, but only if ergm.getCluster was responsible for starting it.

The ergm.restartCluster restarts and returns a cluster, but only if ergm.getCluster was responsible for starting it.

nthreads is a simple generic to obtain the number of parallel processes represented by its argument, keeping in mind that having no cluster (e.g., NULL) represents one thread.

Usage

ergm.getCluster(control = NULL, verbose = FALSE, stop_on_exit = parent.frame())

ergm.stopCluster(..., verbose = FALSE)

ergm.restartCluster(control = NULL, verbose = FALSE)

set.MT_terms(n)

get.MT_terms()

nthreads(clinfo = NULL, ...)

## S3 method for class 'cluster'
nthreads(clinfo = NULL, ...)

## S3 method for class ''NULL''
nthreads(clinfo = NULL, ...)

## S3 method for class 'control.list'
nthreads(clinfo = NULL, ...)

Arguments

control

a control.ergm (or similar) list of parameter values from which the parallel settings should be read; can also be NULL, in which case an existing cluster is used if started, or no cluster otherwise.

verbose

A logical or an integer to control the amount of progress and diagnostic information to be printed. FALSE/0 produces minimal output, with higher values producing more detail. Note that very high values (5+) may significantly slow down processing.

stop_on_exit

An environment or NULL. If an environment, defaulting to that of the calling function, the cluster will be stopped when the calling the frame in question exits.

...

not currently used

n

an integer specifying the number of threads to use; 0 (the starting value) disables multithreading, and -1 or NA sets it to the number of CPUs detected.

clinfo

a cluster or another object.

Details

For estimation that require MCMC, ergm can take advantage of multiple CPUs or CPU cores on the system on which it runs, as well as computing clusters through one of two mechanisms:

Running MCMC chains in parallel

Packages parallel and snow are used to to facilitate this, all cluster types that they support are supported.

The number of nodes used and the parallel API are controlled using the parallel and parallel.type arguments passed to the control functions, such as control.ergm().

The ergm.getCluster() function is usually called internally by the ergm process (in ergm_MCMC_sample()) and will attempt to start the appropriate type of cluster indicated by the control.ergm() settings. The ergm.stopCluster() is helpful if the user has directly created a cluster.

Further details on the various cluster types are included below.

Multithreaded evaluation of model terms

Rather than running multiple MCMC chains, it is possible to attempt to accelerate sampling by evaluating qualified terms' change statistics in multiple threads run in parallel. This is done using the OpenMP API.

However, this introduces a nontrivial amont of computational overhead. See below for a list of the major factors affecting whether it is worthwhile.

Generally, the two approaches should not be used at the same time without caution. In particular, by default, cluster slave nodes will not “inherit” the multithreading setting; but ⁠parallel.inherit.MT=⁠ control parameter can override that. Their relative advantages and disadvantages are as follows:

Value

set.MT_terms() returns the previous setting, invisibly.

get.MT_terms() returns the current setting.

Different types of clusters

PSOCK clusters

The parallel package is used with PSOCK clusters by default, to utilize multiple cores on a system. The number of cores on a system can be determined with the detectCores() function.

This method works with the base installation of R on all platforms, and does not require additional software.

For more advanced applications, such as clusters that span multiple machines on a network, the clusters can be initialized manually, and passed into ergm() and others using the parallel control argument. See the second example below.

MPI clusters

To use MPI to accelerate ERGM sampling, pass the control parameter parallel.type="MPI". ergm requires the snow and Rmpi packages to communicate with an MPI cluster.

Using MPI clusters requires the system to have an existing MPI installation. See the MPI documentation for your particular platform for instructions.

To use ergm() across multiple machines in a high performance computing environment, see the section "User initiated clusters" below.

User initiated clusters

A cluster can be passed into ergm() with the parallel control parameter. ergm() will detect the number of nodes in the cluster, and use all of them for MCMC sampling. This method is flexible: it will accept any cluster type that is compatible with snow or parallel packages.

When is multithreading terms worthwhile?

Note

The this is a setting global to the ergm package and all of its C functions, including when called from other packages via the Linking-To mechanism.

Examples



# Uses 2 SOCK clusters for MCMLE estimation
data(faux.mesa.high)
nw <- faux.mesa.high
fauxmodel.01 <- ergm(nw ~ edges + isolates + gwesp(0.2, fixed=TRUE), 
                     control=control.ergm(parallel=2, parallel.type="PSOCK"))
summary(fauxmodel.01)




[Package ergm version 4.6.0 Index]