R: Goodness-of-fit diagnostics for ERGMs, TERGMs, SAOMs, and...

gof {btergm}

R Documentation

Goodness-of-fit diagnostics for ERGMs, TERGMs, SAOMs, and logit models

Description

Assess goodness of fit of btergm and other network models.

Usage

gof(object, ...)

createGOF(
  simulations,
  target,
  statistics = c(dsp, esp, deg, ideg, geodesic, rocpr, walktrap.modularity),
  parallel = "no",
  ncpus = 1,
  cl = NULL,
  verbose = TRUE,
  ...
)

## S4 method for signature 'btergm'
gof(
  object,
  target = NULL,
  formula = getformula(object),
  nsim = 100,
  MCMC.interval = 1000,
  MCMC.burnin = 10000,
  parallel = c("no", "multicore", "snow"),
  ncpus = 1,
  cl = NULL,
  statistics = c(dsp, esp, deg, ideg, geodesic, rocpr, walktrap.modularity),
  verbose = TRUE,
  ...
)

## S4 method for signature 'ergm'
gof(
  object,
  target = NULL,
  formula = getformula(object),
  nsim = 100,
  MCMC.interval = 1000,
  MCMC.burnin = 10000,
  parallel = c("no", "multicore", "snow"),
  ncpus = 1,
  cl = NULL,
  statistics = c(dsp, esp, deg, ideg, geodesic, rocpr, walktrap.modularity),
  verbose = TRUE,
  ...
)

## S4 method for signature 'mtergm'
gof(
  object,
  target = NULL,
  formula = getformula(object),
  nsim = 100,
  MCMC.interval = 1000,
  MCMC.burnin = 10000,
  parallel = c("no", "multicore", "snow"),
  ncpus = 1,
  cl = NULL,
  statistics = c(dsp, esp, deg, ideg, geodesic, rocpr, walktrap.modularity),
  verbose = TRUE,
  ...
)

## S4 method for signature 'tbergm'
gof(
  object,
  target = NULL,
  formula = getformula(object),
  nsim = 100,
  MCMC.interval = 1000,
  MCMC.burnin = 10000,
  parallel = c("no", "multicore", "snow"),
  ncpus = 1,
  cl = NULL,
  statistics = c(dsp, esp, deg, ideg, geodesic, rocpr, walktrap.modularity),
  verbose = TRUE,
  ...
)

## S4 method for signature 'sienaFit'
gof(
  object,
  period = NULL,
  parallel = c("no", "multicore", "snow"),
  ncpus = 1,
  cl = NULL,
  structzero = 10,
  statistics = c(esp, deg, ideg, geodesic, rocpr, walktrap.modularity),
  groupName = object$f$groupNames[[1]],
  varName = NULL,
  outofsample = FALSE,
  sienaData = NULL,
  sienaEffects = NULL,
  nsim = NULL,
  verbose = TRUE,
  ...
)

## S4 method for signature 'network'
gof(
  object,
  covariates,
  coef,
  target = NULL,
  nsim = 100,
  mcmc = FALSE,
  MCMC.interval = 1000,
  MCMC.burnin = 10000,
  parallel = c("no", "multicore", "snow"),
  ncpus = 1,
  cl = NULL,
  statistics = c(dsp, esp, deg, ideg, geodesic, rocpr, walktrap.modularity),
  verbose = TRUE,
  ...
)

## S4 method for signature 'matrix'
gof(
  object,
  covariates,
  coef,
  target = NULL,
  nsim = 100,
  mcmc = FALSE,
  MCMC.interval = 1000,
  MCMC.burnin = 10000,
  parallel = c("no", "multicore", "snow"),
  ncpus = 1,
  cl = NULL,
  statistics = c(dsp, esp, deg, ideg, geodesic, rocpr, walktrap.modularity),
  verbose = TRUE,
  ...
)

Arguments

`object`	A `btergm`, `ergm`, or `sienaFit` object (for the `btergm`, `ergm`, and `sienaFit` methods, respectively). Or a network object or matrix (for the `network` and `matrix` methods, respectively).
`...`	Arbitrary further arguments to be passed on to the statistics. See also the help page for the gof-statistics.
`simulations`	A list of `network` objects or sparse matrices (generated using the Matrix package) representing simulated networks.
`target`	In the `gof` function: A network or list of networks to which the simulations are compared. If left empty, the original networks from the `btergm` object `x` are used as observed networks. In the `createGOF` function: a list of sparse matrices (generated using the Matrix package) or a list of `network` objects (generated using the network package). The simulations are compared against these target networks.
`statistics`	A list of functions used for comparison of observed and simulated networks. Note that the list should contain the actual functions, not a character representation of them. See gof-statistics for details.
`parallel`	Use multiple cores in a computer or nodes in a cluster to speed up the simulations. The default value `"no"` means parallel computing is switched off. If `"multicore"` is used (only available for `sienaAlgorithm` and `sienaModel` objects), the `mclapply` function from the parallel package (formerly in the multicore package) is used for parallelization. This should run on any kind of system except MS Windows because it is based on forking. It is usually the fastest type of parallelization. If `"snow"` is used, the `parLapply` function from the parallel package (formerly in the snow package) is used for parallelization. This should run on any kind of system including cluster systems and including MS Windows. It is slightly slower than the former alternative if the same number of cores is used. However, `"snow"` provides support for MPI clusters with a large amount of cores, which multicore does not offer (see also the `cl` argument). Note that `"multicore"` will only work if all cores are on the same node. For example, if there are three nodes with eight cores each, a maximum of eight CPUs can be used. Parallel computing is described in more detail on the help page of btergm.
`ncpus`	The number of CPU cores used for parallel GOF assessment (only if `parallel` is activated). If the number of cores should be detected automatically on the machine where the code is executed, one can try the `detectCores()` function from the parallel package. On some HPC clusters, the number of available cores is saved as an environment variable; for example, if MOAB is used, the number of available cores can sometimes be accessed using `Sys.getenv("MOAB_PROCCOUNT")`, depending on the implementation. Note that the maximum number of connections in a single R session (i.e., to other cores or for opening files etc.) is 128, so fewer than 128 cores should be used at a time.
`cl`	An optional parallel or snow cluster for use if `parallel = "snow"`. If not supplied, a cluster on the local machine is created temporarily.
`verbose`	Print details?
`formula`	A model formula from which networks are simulated for comparison. By default, the formula from the `btergm` object `x` is used. It is possible to hand over a formula with only a single response network and/or dyad or edge covariates or with lists of response networks and/or covariates. It is also possible to use indices like `networks[[4]]` or `networks[3:5]` inside the formula.
`nsim`	The number of networks to be simulated at each time step. Example: If there are six time steps in the `formula` and `nsim = 100`, a total of 600 new networks is simulated. The comparison between simulated and observed networks is only done within time steps. For example, the first 100 simulations are compared with the first observed network, simulations 101-200 with the second observed network etc.
`MCMC.interval`	Internally, this package uses the simulation facilities of the ergm package to create new networks against which to compare the original network(s) for goodness-of-fit assessment. This argument sets the MCMC interval to be passed over to the simulation command. The default value is `1000`, which means that every 1000th simulation outcome from the MCMC sequence is used. There is no general rule of thumb on the selection of this parameter, but if the results look suspicious (e.g., when the model fit is perfect), increasing this value may be helpful.
`MCMC.burnin`	Internally, this package uses the simulation facilities of the ergm package to create new networks against which to compare the original network(s) for goodness-of-fit assessment. This argument sets the MCMC burnin to be passed over to the simulation command. The default value is `10000`. There is no general rule of thumb on the selection of this parameter, but if the results look suspicious (e.g., when the model fit is perfect), increasing this value may be helpful.
`period`	Which transition between time periods should be used for GOF assessment? By default, all transitions between all time periods are used. For example, if there are three consecutive networks, this will extract simulations from the transitions between 1 and 2 and between 2 and 3, respectively, and these simulations will be compared to the networks at time steps 2 and 3, respectively. The time period can be provided as a numeric, e.g., `period = 4` for extracting the simulations between time steps 4 and 5 (= the fourth transition) and predicting the fifth network. Values lower than 1 or larger than the number of consecutive networks minus 1 are therefore not permitted. This argument is only used if out-of-sample prediction is switched off.
`structzero`	Which value was used for structural zeros (usually nodes that have dropped out of the network or have not yet joined the network) in the dependent variable/network? These nodes are removed from the observed network and the simulations before comparison. Usually, the value `10` is used for structural zeros in Siena.
`groupName`	The group name used in the Siena model.
`varName`	The variable name that denotes the dependent networks in the Siena model.
`outofsample`	Should out-of-sample prediction be attempted? If so, some additional arguments must be provided: `sienaData`, `sienaEffects`, and `nsim`. The `sienaData` object must contain a base and a target network for out-of-sample prediction. The `sienaEffects` must contain the effects to be used for the simulations. The estimates will be taken from the estimated `object`, and they will be injected into a new SAOM and fixed during the sampling procedure. `nsim` determines how many simulations are used for the out-of-sample comparison.
`sienaData`	An object of the class `siena`, which is usually created using the `sienaDataCreate` function in the RSiena package. This argument is only used for out-of-sample prediction. The object must be based on a `sienaDependent` object that contains two networks: the base network from which to simulate forward, and the target network which you want to predict out-of-sample. The object can contain further objects for storing covariates etc. that are necessary for estimating new networks. The best practice is to create an object that is identical to the `siena` object used for estimating the model, except that it contains the base and the target network instead of the dependent variable/networks.
`sienaEffects`	An object of the class `sienaEffects`, which is usually created using the `getEffects()` and the `includeEffects()` functions in the `RSiena` package. The best practice is to provide a `sienaEffects` object that is identical to the object used to create the original model (that is, it should contain the same effects), except that it should be based on the `siena` object provided through the `sienaData` argument. In other words, the `sienaEffects` object should be based on the base and target network used for out-of-sample prediction, and it should contain the same effects as those used for the original estimation. This argument is used only for out-of-sample prediction.
`covariates`	A list of matrices or network objects that serve as covariates for the dependent network. The covariates in this list are automatically added to the formula as `edgecov` terms.
`coef`	A vector of coefficients.
`mcmc`	Should statnet's MCMC methods be used for simulating new networks? If `mcmc = FALSE`, new networks are simulated based on predicted tie probabilities of the regression equation.

Details

The generic gof function provides goodness-of-fit measures and degeneracy checks for btergm, mtergm, tbergm, ergm, sienaFit, and custom dyadic-independent models. The user can provide a list of network statistics for comparing simulated networks based on the estimated model with the observed network(s). See gof-statistics. The objects created by these methods can be displayed using various plot and print methods (see gof-plot).

In-sample GOF assessment is the default, which means that the same time steps are used for creating simulations and for comparison with the observed network(s). It is possible to do out-of-sample prediction by specifying a (list of) target network(s) using the target argument. If a formula is provided, the simulations are based on the networks and covariates specified in the formula. This is helpful in situations where complex out-of-sample predictions have to be evaluated. A usage scenario could be to simulate from a network at time t (provided through the formula argument) and compare to an observed network at time t + 1 (the target argument). This can be done, for example, to assess predictive performance between time steps of the original networks, or to check whether the model performs well with regard to a newly measured network given the old data from the previous time step.

Predictive fit can also be assessed for stochastic actor-oriented models (SAOM) as implemented in the RSiena package. After compiling the usual objects (model, data, effects), one of the time steps can be predicted based on the previous time step and the SAOM using the sienaFit method of the gof function. By default, however, within-sample fit is used for SAOMs, just like for (T)ERGMs.

The gof methods for networks and matrices serve to assess the goodness of fit of a dyadic-independence model. To do this, the method requires a vector of coefficients (one coefficient for the intercept or edges term and one coefficient for each covariate), a list of covariates (in matrix or network shape), and a dependent network or matrix. This is useful for assessing the goodness of fit of QAP-adjusted logistic regression models (as implemented in the netlogit function in the sna package) or other dyadic-independence models, such as models fitted using glm. Note that this method only works with cross-sectional models and does not accept lists of networks as input data.

The createGOF function is used internally by the gof function in order to create a gof object from a list of simulated networks and a list of target networks to compare against. It can also be used directly by the end user if the user wants to supply lists of simulated and target networks from other sources.

References

Leifeld, Philip, Skyler J. Cranmer and Bruce A. Desmarais (2018): Temporal Exponential Random Graph Models with btergm: Estimation and Bootstrap Confidence Intervals. Journal of Statistical Software 83(6): 1–36. doi:10.18637/jss.v083.i06.

Leifeld, Philip and Skyler J. Cranmer (2019): A Theoretical and Empirical Comparison of the Temporal Exponential Random Graph Model and the Stochastic Actor-Oriented Model. Network Science 7(1): 20–51. doi:10.1017/nws.2018.26.

[Package btergm version 1.10.12 Index]