R: Simulate Trajectories from an ERGM Generating Process

simEGP {ergmgp}

R Documentation

Simulate Trajectories from an ERGM Generating Process

Description

Given an ergm formula, simulate trajectories from a continuous time graph process having the specified ERGM as a limiting distribution. A number of different processes are supported, and termination may be specified either by phenomenological time or event counts.

Usage

simEGP(form, coef, events = 1, time = NULL, rate.factor = 1, 
    time.offset = 0, event.offset = 0, process = c("LERGM", "CRSAOM", 
    "CI", "DS", "CDCSTERGM", "CFCSTERGM", "CSTERGM", "CTERGM"), 
    use.logtime = FALSE, return.changetime = FALSE, 
    changetime.offset = NULL, return.history = FALSE, 
    return.networkDynamic = FALSE, verbose = TRUE,  trace.interval = 100,
    ...)

simEGPTraj(form, coef, events = 1, time = NULL, checkpoints = 1, 
    rate.factor = 1, trajectories = 1, mc.cores = 1, 
    log.sampling = FALSE, process = c("LERGM", "CRSAOM", "CI", "DS",
    "CDCSTERGM", "CFCSTERGM", "CSTERGM", "CTERGM"), use.logtime = FALSE,
    return.changetime = FALSE, return.history = FALSE, verbose = TRUE,
    trace.interval = 100, statsonly = FALSE, monitor = NULL)

Arguments

`form`	an `ergm` formula defining terms for the EGP; the left-hand side must be a network object, whose properties are used to determine the state space. For the `CSTERGM` process, a list containing two such formulas must be used, with named elements `formation` (for the formation model) and `dissolution` (for the dissolution model).
`coef`	vector of coefficients for the EGP; for the CSTERGMs, this should be a containing named elements `formation` and `dissolution`, each of which should be the coefficient vector for its respective model.
`events`	optionally, the number of simulated events to draw (if `time==NULL`); if `time` is specified, this is ignored.
`time`	optionally, the temporal length of the simulation; if not supplied, `events` is used instead to determine when to stop.
`rate.factor`	a multiplicative factor scaling the time evolution of the system; higher values correspond to faster dynamics.
`time.offset`	optionally, an initial “clock” offset for the start of a trajectory; this allows time 0 (the start of the simulation interval) to be set to an arbitrary time point. This is only used for book-keeping (e.g., when a trajectory is run as multiple segments), and does not affect e.g. the meaning of the `time` argument (which is always interpreted as units after the start time).
`event.offset`	optionally, an initial offset to the step or event count for the start of a trajectory (e.g., for trajectories being run in segments). As with `time.offset`, this only affects book-keeping, and has no other effect.
`process`	the ERGM generating process to use (described below).
`use.logtime`	logical; internally, use logarithmic timescale? This can potentially protect against overflow or underflow when rates are extreme, but can reduce precision and adds some overhead.
`return.changetime`	logical; should we return a matrix with the last update times for each edge variable as a network attribute?
`changetime.offset`	optionally, an `n x n` matrix of last change times (for trajectories being resumed in process).
`return.history`	logical; return the entire event history as a network attribute?
`return.networkDynamic`	logical; retain the entire event history and return as a `networkDynamic` object?
`verbose`	logical; provide trace messages regarding simulation progress?
`trace.interval`	for verbose output, the interval at which messages should be printed (in events).
`checkpoints`	number of checkpoints at which the trajectory should be sampled (in addition to the initial state).
`trajectories`	number of independent trajectories to simulate (all start from the seed network, but evolve independently).
`mc.cores`	number of cores to use when simulating trajectories.
`log.sampling`	logical; should time points to sample be logarithmically spaced?
`statsonly`	logical; should only network statistics be retained (and not the graphs themselves)?
`monitor`	optionally, an `ergm` formula with additional statistics to track.
`...`	additional arguments (currently unused).

Details

An ERGM generating process (EGP) is a continuous time graph process with an equilibrium distribution having a known ERGM form. See ergmgp for an overview of EGPs, including the specifications supported here.

simEGP generates a single trajectory from an EGP, with the EGP being specified via its graph potential (as a ergm formula or pair thereof and associated coefficients) and its initial state being given by the left-hand side of the input formula. The trajectory length can be specified either in terms of the number of transitions to be simulated (events) or the length of the trajectory in phenomenological time (time); only the latter leads to the specified ERGM equilibrium (since event times are not “random” times, stopping after a fixed number of events biases the final state). If desired for bookkeeping purposes, an offset can be added to the simulation clock (which otherwise starts at 0), event count (likewise), and most recent change times (also likewise). By default, the return value is a network object containing the final graph state, with network attributes giving the final time ("Time"), event count ("Events"), ERGM potential ("Potential"). A square matrix containing the time of the most recent transition experienced by each edge variable can be returned as a network attribute ("LastChangeTime") if return.changetime is selected. By default, the entire event history is not stored (as it can become extremely large). However, if return.history==TRUE, a matrix containing the event history is saved and returned as a network attribute ("EventHistory"). Alternately, setting return.networkDynamic=TRUE will lead to the event history being stored and the entire trajectory being returned as a networkDynamic object, with edge activities set based on the observed transitions. This format may be easier to use for visualization, or to query the state of the network at an arbitrary point in the trajectory. The durations function can be used to extract edge durations from such objects, as well.

For models with extreme transition rates, the option use.logtime may be useful for avoiding overflow or underflow; this only affects internal calculation, and not reported event times. Note that logscale calculations add some overhead, and may be less precise in some cases than the default, so this option is not suggested unless specifically needed.

simEGPTraj is a wrapper for simEGP, which adds additional capabilities for simulation of multiple trajectories and/or sampling of longer trajectories. Each returned trajectory contains the initial state, as well as simEGP output from checkpoints points along the trajectory (including the end). The default behavior (checkpoints==1) returns the initial and final states. Checkpoints are evenly spaced (with termination criteria indicated as per simEGP) by default, or logarithmically spaced if log.sampling==TRUE. Multiple independent trajectories can be simulated by setting trajectories>1; these can be run in parallel by setting mc.cores>1. If desired, the model statistics can be returned without the graph state by choosing statsonly=TRUE, and a one-sided monitor formula can likewise be used to calculate additional statistics if desired (with similar functionality to the ergm simulate method). Otherwise, network.list objects are returned containing the states in the respective trajectories.

Simulation itself follows the discrete event approach described in Butts (2023). Transition hazards are computed for all edge variables (making the scaling no better then O(N^2) for each update, and are used to draw both the next event and the event time. Because the cost of computing each transition is unrelated to waiting time, this algorithm can be quite efficient at simulating long time periods when events are sparse (unlike, e.g. a discrete-time algorithm that updates in every period). By turns, however, trajectories can become quite expensive (per unit phenomenological time) when event rates are high. This issue is especially pronounced for the CRSAOM and DS processes, which can both generate very high transition rates in some cases. Unless otherwise specified, event histories are not stored, so storage costs are by default unrelated to trajectory length. Care should be taken when storing event histories, as they can become quite large when transition rates are high.

To obtain equilibrium graph distributions from an EGP, it is generally (much) more efficient to use the simulate functions in the ergm package: they employ MCMC algorithms that are unconstrained by the need to follow realistic trajectories, and that are optimized for rapid mixing. (In particular, note that many systems can become kinetically trapped, spending very long periods in metastable states that are far from equilibrium. This can be a real-world phenomenon, but is not always desirable from a computational point of view. Functions such as simEGP are intended to faithfully reproduce such dynamics, while MCMC algorithms are intended to avoid them.) Comparison of late-phase draws from a simERGMPot trajectory with equilibrium ERGM draws can be used to evaluate convergence to equilibrium behavior (where desired); alternately, simEGP can be seeded with ERGM draws to follow trajectories from equilibrated states. Consult the ergm package documentation for details.

Value

For simEGP, a network object containing the final graph state, with network attributes Time, Events, and Potential listing the time, event count, and ERGM potential at the end of the simulation interval. See above for additional attributes that may be added if history retention is activated. If return.networkDynamic==TRUE, then the return value is instead a networkDynamic object containing the event history as edge activity data; be aware that an edge will exist in this object if any corresponding edge is ever active, so the raw graph state should not be used to access the final system state. Instead, use the network.extract method to query the network state at the desired time point.

For simERGMPotTraj, a list containing the simulated trajectories. These are either network.list objects containing the networks at each checkpoint (with time, step, and potential attribute as described above), or else matrices of trace statistics (if statsonly==TRUE). Note that the statistics are in any event included as an attribute to each network list, so the effect of statsonly==TRUE is simply not to retain the graph states.

Note

Using steps to control trajectory termination will lead to biased samples (sometimes severely so); this is because transitions are not random times. If your goal is to obtain equilibrium draws (or draws en route thereto), use time to set the stopping point. See EGPRateEst for a simple tool for calibrating simulation times.

Author(s)

Carter T. Butts buttsc@uci.edu

References

Butts, Carter T. (2023). “Continuous Time Graph Processes with Known ERGM Equilibria: Contextual Review, Extensions, and Synthesis.” Journal of Mathematical Sociology. doi:10.1080/0022250X.2023.2180001

Examples


#Small example of 2-ribbon generation
n<-100
set.seed(1331)
net<-network.initialize(n,directed=FALSE)
sim<-simEGP(net~edges+kstar(2)+nsp(1:2), 
    coef=c(109-log(n),-25,-1.25,3.25), time=100, process="LERGM",
    verbose = TRUE)
plot(sim) #Return value is a single network

#Generate a trajectory showing the process at multiple stages
set.seed(1331)
sim<-simEGPTraj(net~edges+kstar(2)+nsp(1:2), 
    coef=c(109-log(n),-25,-1.25,3.25), time=100, checkpoints = 5, 
    trajectories = 2, mc.cores = 1, log.sampling = TRUE, 
    process = "LERGM", verbose = TRUE) 
length(sim)==2        #One entry per simulated trajectory
op<-par(mfrow=c(2,3))
for(i in 1:6)         #Show the first trajectory
  plot(sim[[1]][[i]],main=paste("Time",round(sim[[1]][[i]]%n%"Time",2)))
summary(sim[[2]]~edges+kstar(2))  #Show selected stats from the second
attributes(sim[[1]])  #Show precomputed statistics
par(op)

#A simple example with statsonly
set.seed(1331)
sim<-simEGPTraj(net~edges+esp(0), coef = c(log(2)-log(n), -1), time = 200,
    checkpoints = 25, process = "LERGM", statsonly = TRUE, 
    monitor = ~triangle)
sim                   #Note the monitor stat
op<-par(mfrow=c(1,1))
plot(sim[,"Time"], sim[,"edges"], type = "l")  #Time by edge count
lines(sim[,"Time"], sim[,"esp0"], col = 2)     #Add ESP(0)s
par(op)

[Package ergmgp version 0.1-1 Index]