vcov.ergmito {ergmito}R Documentation

Estimation of ERGMs using Maximum Likelihood Estimation (MLE)

Description

ergmito uses Maximum Likelihood Estimation (MLE) to fit Exponential Random Graph Models for single or multiple small networks, the later using pooled-data MLE. To do so we use exact likelihoods, which implies fully enumerating the support of the model. Overall, the exact likelihood calculation is only possible when dealing with directed (undirected) networks size 5 (7). In general, directed (undirected) graphs with more than 5 (7) vertices should not be fitted using MLE, but instead other methods such as the MC-MLE algorithm or the Robbins-Monro Stochastic Approximation algorithm, both of which are available in the ergm R package.The workhorse function of ergmito is the ergm package function ergm::ergm.allstats().

Usage

## S3 method for class 'ergmito'
vcov(object, solver = NULL, ...)

ergmito(
  model,
  model_update = NULL,
  stats_weights = NULL,
  stats_statmat = NULL,
  optim.args = list(),
  init = NULL,
  use.grad = TRUE,
  target_stats = NULL,
  ntries = 1L,
  keep.stats = TRUE,
  target_offset = NULL,
  stats_offset = NULL,
  ...
)

Arguments

object

An object of class ergmito

solver

Function. Used to compute the inverse of the hessian matrix. When not null, the variance-covariance matrix is recomputed using that function. By default, ergmito uses MASS::ginv.

...

Further arguments passed to the method. In the case of ergmito, ... are passed to ergmito_formulae.

model

Model to estimate. See ergm::ergm. The only difference with ergm is that the LHS can be a list of networks.

model_update

A formula. this can be used to apply transformations, create interaction effects, add offset terms, etc. (see examples below and more details in ergmito_formulae).

stats_weights

Either an integer vector or a list of integer vectors (see exact_loglik).

stats_statmat

Either a matrix or a list of matrices (see exact_loglik).

optim.args

List. Passed to stats::optim.

init

A numeric vector. Sets the starting parameters for the optimization routine. Default is a vector of zeros.

use.grad

Logical. When TRUE passes the gradient function to optim. This is intended for testing only (internal use).

target_stats

A matrix of target statistics (see ergm::ergm).

ntries

Integer scalar. Number of tries to estimate the MLE (see details).

keep.stats

Logical scalar. When TRUE (the default), the matrices and vectors associated with the sufficient statistics will be returned. Otherwise the function discards them. This may be useful for saving memory space when estimating multiple models.

target_offset, stats_offset

See exact_loglik().

Details

The support of the sufficient statistics is calculated using ERGM's ergm::ergm.allstats() function.

Value

An list of class ergmito:

Methods base::print(), base::summary(), stats::coef(), stats::logLik(), stats::nobs(), stats::vcov(), stats::AIC(), stats::BIC(), stats::confint(), and stats::formula() are available.

MLE

Maximum Likelihood Estimates are obtained using the stats::optim function. The default method for maximization is BFGS using both the log-likelihood function and its corresponding gradient.

Another important factor to consider is the existence of the MLE estimates As shown in Handcock (2003), if the observed statistics are near the border if the support function (e.g. too many edges or almost none), then, even if the MLE estimates exists, the optimization function may not be able to reach the optima. Moreover, if the target (observed) statistics live in the boundary, then the MLE estimates do not exists. In general, this should not be an issue in the context of the pooled model, as the variability of observed statistics should be enough to avoid those situations.

The function ergmito will try to identify possible cases of non-existence, of the MLE, and if identified then try to re estimate the model parameters using larger values than the ones obtained, if the log-likelihood is greater, then it is assumed that the model is degenerate and the corresponding values will be replaced with either +Inf or -Inf. By default, this behavior is checked anytime that the absolute value of the estimates is greater than 5, or the sufficient statistics were flagged as potentially outside of the interior of the support (close to zero or to its max).

In the case of ntries, the optimization is repeated that number of times, each time perturbing the init parameter by adding a Normally distributed vector. The result which reaches the highest log-likelihood will be the one reported as parameter estimates. This feature is intended for testing only. Anecdotally, optim reaches the max in the first try.

See Also

The function plot.ergmito() and gof_ergmito() for post-estimation diagnostics.

Examples


# Generating a small graph
set.seed(12)
n <- 4
net <- rbernoulli(n, p = .3)

model <- net ~ edges + mutual

library(ergm)
ans_ergmito <- ergmito(model)
ans_ergm  <- ergm(model)

# The ergmito should have a larger value
ergm.exact(ans_ergmito$coef, model)
ergm.exact(ans_ergm$coef, model)

summary(ans_ergmito)
summary(ans_ergm)

# Example 2: Estimating an ERGMito using data with know DGP parameters -----
data(fivenets) 

model1 <- ergmito(fivenets ~ edges + nodematch("female"))
summary(model1) # This data has know parameters equal to -2.0 and 2.0

# Example 3: Likelihood ratio test using the lmtest R package

if (require(lmtest)) {
  data(fivenets)
  model1 <- ergmito(fivenets ~ edges + nodematch("female"))
  model2 <- ergmito(fivenets ~ edges + nodematch("female") + mutual)
  
  lrtest(model1, model2)
  # Likelihood ratio test
  # 
  # Model 1: fivenets ~ edges + nodematch("female") 
  # Model 2: fivenets ~ edges + nodematch("female") + mutual
  #   #Df  LogLik Df  Chisq Pr(>Chisq)
  # 1   2 -34.671                     
  # 2   3 -34.205 1 0.9312     0.3346
}

# Example 4: Adding an reference term for edge-count ----------------------

# Simulating networks of different sizes
set.seed(12344)
nets <- rbernoulli(c(rep(4, 10), rep(5, 10)), c(rep(.2, 10), rep(.1, 10)))

# Fitting an ergmito under the Bernoulli model
ans0 <- ergmito(nets ~ edges)
summary(ans0)
# 
# ERGMito estimates
# 
# formula:
#   nets ~ edges
# 
#       Estimate Std. Error z value  Pr(>|z|)    
# edges -1.68640    0.15396 -10.954 < 2.2e-16 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# AIC: 279.3753    BIC: 283.1436    (Smaller is better.) 


# Fitting the model including a reference term for networks of size 5.
# Notice that the variable -n- and other graph attributes can be used
# with -model_update-.
ans1 <- ergmito(nets ~ edges, model_update = ~ I(edges * (n == 5)))
summary(ans1)
# 
# ERGMito estimates
# 
# formula:
#   nets ~ edges + I(edges * (n == 5))
# 
#                     Estimate Std. Error z value  Pr(>|z|)    
# edges               -1.18958    0.21583 -5.5116 3.556e-08 ***
# I(edges * (n == 5)) -0.90116    0.31250 -2.8837   0.00393 ** 
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# AIC: 272.9916    BIC: 280.5282    (Smaller is better.) 

# The resulting parameter for the edge-count is smaller for networks
# of size five
plogis(coef(ans1)[1])   # 0.23
plogis(sum(coef(ans1))) # 0.11

# We can see that in this case the difference in edge-count matters.
if (require(lmtest)) {

  lrtest(ans0, ans1)
  # Likelihood ratio test
  # 
  # Model 1: nets ~ edges
  # Model 2: nets ~ edges + I(edges * (n == 5))
  # #Df  LogLik Df  Chisq Pr(>Chisq)   
  # 1   1 -138.69                        
  # 2   2 -134.50  1 8.3837   0.003786 **
  #   ---
  #   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
}


[Package ergmito version 0.3-1 Index]