R: Evaluation on multiple networks

N-ergmTerm {ergm.multi}

R Documentation

Evaluation on multiple networks

Description

Evaluates the terms in formula on each of the networks joined using Networks function, and returns either a weighted sum or an lm-style linear model for the ERGM coefficients (Krivitsky et al. 2023). Its syntax follows that of lm closely, with sensible defaults.

The default formula (~1) sums the specified network statistics. If lm refers to any network attributes for which some networks have missing values, the term will stop with an error. This can be avoided by pre-filtering with subset, which controls which networks are affected by the term.

Usage

# binary: N(formula, lm=~1, subset=TRUE, weights=1, contrasts=NULL, offset=0, label=NULL,
#           .NetworkID=".NetworkID", .NetworkName=".NetworkName")

# valued: N(formula, lm=~1, subset=TRUE, weights=1, contrasts=NULL, offset=0, label=NULL,
#           .NetworkID=".NetworkID", .NetworkName=".NetworkName")

Arguments

`.NetworkID`, `.NetworkName`	Optional strings indicating the vertex attributes used to distinguish and name the networks; intended to be used by term developers.
`formula`	a one-sided `ergm()`-style formula with the terms to be evaluated
`lm`	a one-sided `lm()`-style formula whose RHS specifies the network-level predictors for the terms in the `ergm()` formula `formula`.
`subset`, `contrasts`	see `lm()`.
`offset`	A constant, a vector of length equal to the number of networks, or a matrix whose number of rows is the number of networks and whose number of columns is the number of free parameters of the ERGM. It can be specified in `lm` as well.
`weights`	reserved for future use; attempting to change it will cause an error: at this time, there is no way to assign sampling weights to networks.
`label`	An optional parameter which will add a label to model parameters to help identify the term (which may have similar predictors but, say, a different network subset) in the output or a function that wraps the names.

Offsets and fixing parameters

By default, an N(formula, lm) term will add p \times q free parameters, where p is the number of free parameters (possibly curved) of the ERGM specified by formula, and q is the number of parameters specified by the lm formula. That is, there would be one parameter for each combination of an ERGM parameter and a linear model parameter, in an ERGM-major order (i.e., for each ERGM parameter, the linear model parameters will be enumerated). For example, the term gwesp() has two free parameters: its coefficient and its decay rate. We can specify a model in which they depend on \log(n) as N(~gwesp, ~log(n)), resulting in the following 4 parameters, with the intercept for the linear model being implicit:

#> [1] "N(1)~gwesp"            "N(log(n))~gwesp"       "N(1)~gwesp.decay"     
#> [4] "N(log(n))~gwesp.decay"

If a different linear model is desired for different ERGM terms (e.g., some are to be affected by network size while others are not), multiple N() terms can be specified. This covers most such cases, but not all. For example, suppose that for the above model, we wish for its coefficient to depend on log(n) but for the decay parameter not to. In this case, one can use the offset() decorator with partial offsetting. Then, specifying offset(N(~gwesp(), ~log(n)), 4), we get:

#> [1] "N(1)~gwesp"                    "N(log(n))~gwesp"              
#> [3] "N(1)~gwesp.decay"              "offset(N(log(n))~gwesp.decay)"

Then, setting the corresponding offset.coef = 0 will fix the coefficient of log(n) for the decay parameter at 0, while allowing a constant decay parameter to be estimated.

Note

Care should be taken to avoid multicollinearity when using this operator. As with the lm() function, lm formulas have an implicit intercept, which can be suppressed by specifying ~ 0 + ... or ~ -1 + ... on the formula. When lm is given a model with intercept and a categorical predictor (including a logical one), it will use the first level (or FALSE) as the baseline, but if the model is without intercept, it will use all levels of the first categorical predictor. This is typically what is wanted in a linear regression, but for the N operator, this can be problematic if the "intercept" effect is added by a different term. A workaround is to convert the categorical predictor to dummy variables before putting it into the lm formula.

References

Krivitsky PN, Coletti P, Hens N (2023). “A Tale of Two Datasets: Representativeness and Generalisability of Inference for Samples of Networks.” Journal of the American Statistical Association, 118(544), 2213-2224. doi:10.1080/01621459.2023.2242627.