vcov.ergmito {ergmito} | R Documentation |
Estimation of ERGMs using Maximum Likelihood Estimation (MLE)
Description
ergmito
uses Maximum Likelihood Estimation (MLE) to fit Exponential Random
Graph Models for single or multiple small networks, the later using
pooled-data MLE. To do so we use exact likelihoods, which implies fully
enumerating the support of the model. Overall, the exact likelihood
calculation is only possible when dealing with directed (undirected) networks
size 5 (7). In general, directed (undirected) graphs with more than 5 (7)
vertices should not be fitted using MLE, but instead other methods such as
the MC-MLE algorithm or the Robbins-Monro Stochastic Approximation algorithm,
both of which are available in the ergm R package.The workhorse function of
ergmito
is the ergm
package function ergm::ergm.allstats()
.
Usage
## S3 method for class 'ergmito'
vcov(object, solver = NULL, ...)
ergmito(
model,
model_update = NULL,
stats_weights = NULL,
stats_statmat = NULL,
optim.args = list(),
init = NULL,
use.grad = TRUE,
target_stats = NULL,
ntries = 1L,
keep.stats = TRUE,
target_offset = NULL,
stats_offset = NULL,
...
)
Arguments
object |
An object of class |
solver |
Function. Used to compute the inverse of the hessian matrix. When
not null, the variance-covariance matrix is recomputed using that function.
By default, |
... |
Further arguments passed to the method. In the case of |
model |
Model to estimate. See ergm::ergm. The only difference with
|
model_update |
A |
stats_weights |
Either an integer vector or a list of integer vectors (see exact_loglik). |
stats_statmat |
Either a matrix or a list of matrices (see exact_loglik). |
optim.args |
List. Passed to stats::optim. |
init |
A numeric vector. Sets the starting parameters for the optimization routine. Default is a vector of zeros. |
use.grad |
Logical. When |
target_stats |
A matrix of target statistics (see ergm::ergm). |
ntries |
Integer scalar. Number of tries to estimate the MLE (see details). |
keep.stats |
Logical scalar. When |
target_offset , stats_offset |
See |
Details
The support of the sufficient statistics is calculated using ERGM's
ergm::ergm.allstats()
function.
Value
An list of class ergmito
:
-
call
The program call. -
coef
Named vector. Parameter estimates. -
iterations
Integer. Number of times the log-likelihood was evaluated (see stats::optim). -
mle.lik
Numeric. Final value of the objective function. -
null.lik
Numeric. Final value of the objective function for the null model. -
covar
Square matrix of sizelength(coef)
. Variance-covariance matrix computed using the exact hessian as implemented in exact_hessian. -
coef.init
Named vector of lengthlength(coef)
. Initial set of parameters used in the optimization. -
formulae
An object of class ergmito_loglik. -
nobs
Integer scalar. Number of networks in the model. -
network
Networks passed viamodel
. -
optim.out
,optim.args
Results from the optim call and arguments passed to it. -
status
,note
Convergence code. See check_convergence -
best_try
Integer scalar. Index of the run with the highest log-likelihood value. -
history
Matrix of sizentries * (k + 1)
. History of the parameter estimates and the reached log-likelihood values. -
timer
Vector of times (for benchmarking). Each unit marks the starting point of the step.
Methods base::print()
, base::summary()
, stats::coef()
, stats::logLik()
,
stats::nobs()
, stats::vcov()
, stats::AIC()
, stats::BIC()
,
stats::confint()
, and stats::formula()
are available.
MLE
Maximum Likelihood Estimates are obtained using the stats::optim function.
The default method for maximization is BFGS
using both the log-likelihood
function and its corresponding gradient.
Another important factor to consider is the existence of the MLE estimates As shown in Handcock (2003), if the observed statistics are near the border if the support function (e.g. too many edges or almost none), then, even if the MLE estimates exists, the optimization function may not be able to reach the optima. Moreover, if the target (observed) statistics live in the boundary, then the MLE estimates do not exists. In general, this should not be an issue in the context of the pooled model, as the variability of observed statistics should be enough to avoid those situations.
The function ergmito
will try to identify possible cases of non-existence,
of the MLE, and if identified then try to re estimate the model parameters using
larger values than the ones obtained, if the log-likelihood is greater, then it is
assumed that the model is degenerate and the corresponding values will be
replaced with either +Inf
or -Inf
. By default, this behavior is checked
anytime that the absolute value of the estimates is greater than 5, or the
sufficient statistics were flagged as potentially outside of the interior of
the support (close to zero or to its max).
In the case of ntries
, the optimization is repeated that number of times,
each time perturbing the init
parameter by adding a Normally distributed
vector. The result which reaches the highest log-likelihood will be the one
reported as parameter estimates. This feature is intended for testing only.
Anecdotally, optim
reaches the max in the first try.
See Also
The function plot.ergmito()
and gof_ergmito()
for post-estimation
diagnostics.
Examples
# Generating a small graph
set.seed(12)
n <- 4
net <- rbernoulli(n, p = .3)
model <- net ~ edges + mutual
library(ergm)
ans_ergmito <- ergmito(model)
ans_ergm <- ergm(model)
# The ergmito should have a larger value
ergm.exact(ans_ergmito$coef, model)
ergm.exact(ans_ergm$coef, model)
summary(ans_ergmito)
summary(ans_ergm)
# Example 2: Estimating an ERGMito using data with know DGP parameters -----
data(fivenets)
model1 <- ergmito(fivenets ~ edges + nodematch("female"))
summary(model1) # This data has know parameters equal to -2.0 and 2.0
# Example 3: Likelihood ratio test using the lmtest R package
if (require(lmtest)) {
data(fivenets)
model1 <- ergmito(fivenets ~ edges + nodematch("female"))
model2 <- ergmito(fivenets ~ edges + nodematch("female") + mutual)
lrtest(model1, model2)
# Likelihood ratio test
#
# Model 1: fivenets ~ edges + nodematch("female")
# Model 2: fivenets ~ edges + nodematch("female") + mutual
# #Df LogLik Df Chisq Pr(>Chisq)
# 1 2 -34.671
# 2 3 -34.205 1 0.9312 0.3346
}
# Example 4: Adding an reference term for edge-count ----------------------
# Simulating networks of different sizes
set.seed(12344)
nets <- rbernoulli(c(rep(4, 10), rep(5, 10)), c(rep(.2, 10), rep(.1, 10)))
# Fitting an ergmito under the Bernoulli model
ans0 <- ergmito(nets ~ edges)
summary(ans0)
#
# ERGMito estimates
#
# formula:
# nets ~ edges
#
# Estimate Std. Error z value Pr(>|z|)
# edges -1.68640 0.15396 -10.954 < 2.2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# AIC: 279.3753 BIC: 283.1436 (Smaller is better.)
# Fitting the model including a reference term for networks of size 5.
# Notice that the variable -n- and other graph attributes can be used
# with -model_update-.
ans1 <- ergmito(nets ~ edges, model_update = ~ I(edges * (n == 5)))
summary(ans1)
#
# ERGMito estimates
#
# formula:
# nets ~ edges + I(edges * (n == 5))
#
# Estimate Std. Error z value Pr(>|z|)
# edges -1.18958 0.21583 -5.5116 3.556e-08 ***
# I(edges * (n == 5)) -0.90116 0.31250 -2.8837 0.00393 **
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# AIC: 272.9916 BIC: 280.5282 (Smaller is better.)
# The resulting parameter for the edge-count is smaller for networks
# of size five
plogis(coef(ans1)[1]) # 0.23
plogis(sum(coef(ans1))) # 0.11
# We can see that in this case the difference in edge-count matters.
if (require(lmtest)) {
lrtest(ans0, ans1)
# Likelihood ratio test
#
# Model 1: nets ~ edges
# Model 2: nets ~ edges + I(edges * (n == 5))
# #Df LogLik Df Chisq Pr(>Chisq)
# 1 1 -138.69
# 2 2 -134.50 1 8.3837 0.003786 **
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
}