siena07 {RSiena} | R Documentation |
Function to estimate parameters in a Siena model
Description
Estimates parameters in a Siena model using Method of Moments,
based on direct simulation, conditional or otherwise;
or using Generalized Method of Moments;
or using Maximum Likelihood by MCMC simulation.
Estimation is done using a Robbins-Monro algorithm.
Note that the data and particular model to be used
must be passed in using named arguments as the ...
,
and the specification for the algorithm must be passed on
as x
,
which is a sienaAlgorithm
object as produced by
sienaAlgorithmCreate
(see examples).
Usage
siena07(x, batch=FALSE, verbose=FALSE, silent=FALSE,
useCluster=FALSE, nbrNodes=2,
thetaValues = NULL,
returnThetas = FALSE,
targets = NULL,
initC=TRUE,
clusterString=rep("localhost", nbrNodes), tt=NULL,
parallelTesting=FALSE, clusterIter=!x$maxlike,
clusterType=c("PSOCK", "FORK"), cl=NULL, ...)
Arguments
x |
A control object, of class |
batch |
Desired interface: |
verbose |
Produces various output to the console if |
silent |
Produces no output to the console if |
useCluster |
Boolean: whether to use a cluster of processes (useful if multiple processors are available). |
nbrNodes |
Number of processes to use if useCluster is |
thetaValues |
If not |
returnThetas |
Boolean: whether to return theta values and generated estimation statistics of Phase 2 runs. |
targets |
Numeric vector of length equal to the number of estimated parameters, meant to supersede the targets calculated from the data set; see "Details". Not for regular use. |
initC |
Boolean: set to |
clusterString |
Definitions of clusters. Default set up to use the local machine only. |
tt |
A |
parallelTesting |
Boolean. If |
clusterIter |
Boolean. If |
clusterType |
Either "PSOCK" or "FORK". On Windows, must be "PSOCK". On a single non-Windows machine may be "FORK", and subprocesses will be formed by forking. If "PSOCK", subprocesses are formed using R scripts. |
cl |
An object of class c("SOCKcluster", "cluster") (see Details). |
... |
Arguments for the simulation function, see
|
Details
This is the main function and workhorse of RSiena.
For use of siena07
, it is necessary to specify parameters data
(RSiena data set) and effects
(effects object), which are
required parameters in function simstats0c
.
(These parameters are inserted through '...'.) See the examples.
siena07
runs a Robbins-Monro algorithm for parameter estimation
using the three-phase implementation described in Snijders (2001, 2017),
with (if x$findiff=FALSE
)
derivative estimation as in Schweinberger and Snijders (2007).
The default is estimation according to the Method of Moments
as in Snijders, Steglich and Schweinberger (2007).
If x$gmm=TRUE
and myeff
contains one or more
gmm
statistics as included by function
includeGMoMStatistics
, the algorithm employs the
Generalized Method of Moments as defined in Amati, Schoenenberger, and
Snijders (2015, 2019).
For continuous behavior variables defined with type="continuous"
in
sienaDependent
, estimation is done as
described in Niezink and Snijders (2017).
If x$maxlike=TRUE
, estimation is done by Maximum Likelihood
implemented as in Snijders, Koskinen and Schweinberger (2010).
Phase 1 does a few iterations to
estimate the derivative matrix of the targets with respect to the
parameter vector. Phase 2 does the estimation. Phase 3 runs a
simulation to estimate standard errors and check convergence of the model.
The simulation function is called once for each iteration in these phases
and also once to initialise the model fitting and once to complete it.
Unless in batch mode, a tcl/tk screen is displayed to allow interruption
and to show progress.
If targets
is specified (which should be done only in special cases),
and provided that estimation is by the Method of Moments, the data
is not a multi-group data set and has exactly 2 waves,
and if the length of the vector targets
is equal to the number
of estimated parameters (not counting the rate parameters estimated
by conditional estimation), then the vector targets
supersedes the
targets calculated from the data set.
It is necessary to check that convergence has been achieved.
The rule of thumb is that the all t-ratios for convergence should be
in absolute value less than 0.1 and
the overall maximum convergence ratio should be less than 0.25.
If this was not achieved, the result can be used to start
another estimation run from the estimate obtained, using
the parameter prevAns
as illustrated in the example below.
(This parameter is inserted through '...' into the function
initializeFRAN
.)
For good estimation of standard errors, it is necessary that x$n3
is
large enough. More about this is in the manual. The default value x$n3
set in sienaAlgorithmCreate
is adequate for most explorative use,
but for presentation in publications larger values are necessary, depending
on the data set and model; e.g., x$n3=3000
or larger.
Parameters can be tested against zero by dividing the estimate by its
standard error and using an approximate standard normal null distribution.
Further, functions Wald.RSiena
and
Multipar.RSiena
are available for multi-parameter testing.
Parameters specified in includeEffects
or
setEffect
with fix=TRUE, test=TRUE
will not be estimated;
score tests of their hypothesized values are reported in the output file
specified in the control (algorithm) object.
These tests can be obtained also using score.Test
.
If x$simOnly
is TRUE
, which is meant to go together with
x$nsub=0
, the calculation of the standard errors and covariance
matrix at the end of Pase 3 is skipped. No estimation is performed.
If thetaValues
is not NULL
, the parameter values in
the rows of this matrix will be used in the consecutive runs of Phase 3.
If x$n3
is larger than the number of rows times nbrNodes
(see below), the last row of thetaValues
will continue to be used.
The parameter values actually used will be stored in the
output matrix thetaUsed
.
Multiple processors are used for estimation by MoM to distribute each iteration
in each subphase over the cluster of nodes. The number of iterations
accordingly will be divided (approximately) by the number of nodes;
for phase 2, unless n2start
is specified.
This implies that if multiple processors are used, think of dividing
n2start
by nbrNodes
.
For estimation by ML, multiple processing is done per period. Therefore,
for one period (two waves) and one group, this will have no effect.
In the case of using multiple processors, there are two options for telling
siena07
to use them. By specifying the options useCluster
,
nbrNodes
, clusterString
and initC
, siena07
will create a cluster object
that will be used by the
parallel
package. After finishing the estimation procedure,
siena07
will automatically stop the cluster. Alternatively, instead of
having the function to create a cluster, the user may provide its own by
specifying the option cl
, similar to what the boot function does in
the boot package. By using the option cl
the user may be
able to create more complex clusters (see examples below).
If thetaValues
is not NULL
and nbrNodes >= 2
, parameters
in Phase 3 will be constant for each set of nbrNodes
consecutive
simulations. This must be noted in the interpretation, and will be visible
in thetaUsed
(see below).
Value
Returns an object of class sienaFit
, some parts of which are:
OK |
Boolean indicating successful termination |
termination |
Character string, values: "OK", "Error", or "UserInterrupt". "UserInterrupt" indicates that the user asked for early termination before phase 3. |
f |
Various characteristics of the data and model definition. |
requestedEffects |
The included effects in the effects object. |
effects |
The included effects in the effects object to which are added the main effects of the requested interaction effects, if any. |
theta |
Estimated value of theta, if |
thetas |
Matrix, returned if |
sfs |
Matrix, returned if |
covtheta |
Estimated covariance matrix of theta; this is not available
if |
se |
Vector of standard errors of estimated theta,
if |
dfra |
Matrix of estimated derivatives. |
sf |
Matrix of simulated deviations from targets in phase 3. |
sf2 |
Array of periodwise deviations from simulations in phase 3.
Not included if |
tconv |
t-statistics for convergence. |
tmax |
maximum absolute t-statistic for convergence for non-fixed parameters. |
tconv.max |
overall maximum convergence ratio. |
ac3 |
If |
targets |
Observed statistics; for ML, zero vector. |
targets2 |
Observed statistics by wave, starting with second wave; for ML, zero matrix. |
ssc |
Score function contributions for each wave for each
simulation in phase 3. Not included if finite difference method is used
or if |
scores |
Score functions, added over waves, for each
simulation in phase 3. Only included
if |
regrCoef |
If |
regrCor |
If |
estMeans |
Estimated means of estimation statistics. |
estMeans.sem |
If |
sims |
If |
chain |
If |
Phase3nits |
Number of iterations actually performed in phase 3. |
thetaUsed |
If |
Writes text output to the file named "projname.txt", where projname
is defined in the sienaAlgorithm
object x
.
Author(s)
Ruth Ripley, Tom Snijders, Viviana Amati, Felix Schoenenberger, Nynke Niezink
References
Amati, V., Schoenenberger, F., and Snijders, T.A.B. (2015), Estimation of stochastic actor-oriented models for the evolution of networks by generalized method of moments. Journal de la Societe Francaise de Statistique 156, 140–165.
Amati, V., Schoenenberger, F., and Snijders, T.A.B. (2019), Contemporaneous statistics for estimation in stochastic actor-oriented co-evolution models. Psychometrika 84, 1068–1096.
Greenan, C. (2015), Evolving Social Network Analysis: developments in statistical methodology for dynamic stochastic actor-oriented models. DPhil dissertation, University of Oxford.
Niezink, N.M.D., and Snijders, T.A.B. (2017), Co-evolution of Social Networks and Continuous Actor Attributes. The Annals of Applied Statistics 11, 1948–1973.
Schweinberger, M., and Snijders, T.A.B. (2007), Markov models for digraph panel data: Monte Carlo based derivative estimation. Computational Statistics and Data Analysis 51, 4465–4483.
Snijders, T.A.B. (2001), The statistical evaluation of social network dynamics. Sociological Methodology 31, 361–395.
Snijders, T.A.B. (2017), Stochastic Actor-Oriented Models for Network Dynamics. Annual Review of Statistics and Its Application 4, 343–363.
Snijders, T.A.B., Koskinen, J., and Schweinberger, M. (2010). Maximum likelihood estimation for social network dynamics. Annals of Applied Statistics 4, 567–588.
Snijders, T.A.B., Steglich, C.E.G., and Schweinberger, Michael (2007), Modeling the co-evolution of networks and behavior. Pp. 41–71 in Longitudinal models in the behavioral and related sciences, edited by van Montfort, K., Oud, H., and Satorra, A.; Lawrence Erlbaum.
Steglich, C.E.G., Snijders, T.A.B., and Pearson, M.A. (2010), Dynamic networks and behavior: Separating selection from influence. Sociological Methodology 40, 329–393. Information about the implementation of the algorithm is in https://www.stats.ox.ac.uk/~snijders/siena/Siena_algorithms.pdf. Further see https://www.stats.ox.ac.uk/~snijders/siena/ .
See Also
siena
, sienaAlgorithmCreate
,
sienaEffects
,
Wald.RSiena
, Multipar.RSiena
,
score.Test
.
There are print, summary and xtable methods for sienaFit
objects: xtable
, print.sienaFit
.
Examples
myalgorithm <- sienaAlgorithmCreate(nsub=2, n3=100, seed=1293)
# nsub=2, n3=100 is used here for having a brief computation, not for practice.
mynet1 <- sienaDependent(array(c(tmp3, tmp4), dim=c(32, 32, 2)))
mydata <- sienaDataCreate(mynet1)
myeff <- getEffects(mydata)
ans <- siena07(myalgorithm, data=mydata, effects=myeff, batch=TRUE)
# or for non-conditional estimation --------------------------------------------
## Not run:
model <- sienaAlgorithmCreate(nsub=2, n3=100, cond=FALSE, seed=1283)
ans <- siena07(myalgorithm, data=mydata, effects=myeff, batch=TRUE)
## End(Not run)
# or if a previous "on track" result ans was obtained --------------------------
## Not run:
ans1 <- siena07(myalgorithm, data=mydata, effects=myeff, prevAns=ans)
## End(Not run)
# Running in multiple processors -----------------------------------------------
## Not run:
# Not tested because dependent on presence of processors
# Find out how many processors there are
library(parallel)
(n.clus <- detectCores() - 1)
n.clus <- min(n.clus, 4) # keep time for other processes
ans2 <- siena07(myalgorithm, data=mydata, effects=myeff,
useCluster=TRUE, nbrNodes=n.clus, initC=TRUE)
# Suppose 8 processors are going to be used.
# Loading the parallel package and creating a cluster
# with 8 processors (this should be equivalent)
library(parallel)
cl <- makeCluster(n.clus)
ans3 <- siena07(myalgorithm, data=mydata, effects=myeff, batch=TRUE, cl = cl)
# Notice that now -siena07- perhaps won't stop the cluster for you.
# stopCluster(cl)
# You can create even more complex clusters using several computers. In this
# example we are creating a cluster with 3*8 = 24 processors on three
# different machines.
#cl <- makePSOCKcluster(
# rep(c('localhost', 'machine2.website.com' , 'machine3.website.com'), 8),
# user='myusername', rshcmd='ssh -p PORTNUMBER')
#ans4 <- siena07(myalgorithm, data=mydata, effects=myeff, batch=TRUE, cl = cl)
#stopCluster(cl)
## End(Not run)
# for a continuous behavior variable -------------------------------------------
# simulate behavior data according to dZ(t) = [-0.1 Z + 1] dt + 1 dW(t)
set.seed(123)
y1 <- rnorm(50, 0,3)
y2 <- exp(-0.1) * y1 + (1-exp(-0.1)) * 1/ -0.1 + rnorm(50, 0, (exp(-0.2)- 1) / -0.2 * 1^2)
friend <- sienaDependent(array(c(s501, s502), dim = c(50,50,2)))
behavior <- sienaDependent(matrix(c(y1,y2), 50,2), type = "continuous")
(mydata <- sienaDataCreate(friend, behavior))
(myeff <- getEffects(mydata, onePeriodSde = TRUE))
algorithmMoM <- sienaAlgorithmCreate(nsub=1, n3=20, seed=321)
(ans <- siena07(myalgorithm, data = mydata, effects = myeff, batch=TRUE))
# Accessing simulated networks for ML ------------------------------------------
# The following is an example for accessing the simulated networks for ML,
# which makes sense only if there are some missing tie variables;
# observed tie variables are identically simulated
# at the moment of observation,
# missing tie variable are imputed in a model-based way.
mat1 <- matrix(c(0,0,1,1,
1,0,0,0,
0,0,0,1,
0,1,0,0),4,4, byrow=TRUE)
mat2 <- matrix(c(0,1,1,1,
1,0,0,0,
0,0,0,1,
0,0,1,0),4,4, byrow=TRUE)
mat3 <- matrix(c(0,1,0,1,
1,0,0,0,
0,0,0,0,
NA,1,1,0),4,4, byrow=TRUE)
mats <- array(c(mat1,mat2,mat3), dim=c(4,4,3))
net <- sienaDependent(mats, allowOnly=FALSE)
sdat <- sienaDataCreate(net)
alg <- sienaAlgorithmCreate(maxlike=TRUE, nsub=3, n3=100, seed=12534)
effs <- getEffects(sdat)
(ans <- siena07(alg, data=sdat, effects=effs, returnDeps=TRUE, batch=TRUE))
# See manual Section 9.1 for information about the following functions
edges.to.adj <- function(x,n){
# create empty adjacency matrix
adj <- matrix(0, n, n)
# put edge values in desired places
adj[x[, 1:2]] <- x[, 3]
adj
}
the.edge <- function(x,n,h,k){
edges.to.adj(x,n)[h,k]
}
# Now show the results
n <- 4
ego <- rep.int(1:n,n)
alter <- rep(1:n, each=n)
# Get the average simulated adjacency matrices for wave 3 (period 2):
ones <- sapply(1:n^2, function(i)
{mean(sapply(ans$sims,
function(x){the.edge(x[[1]][[2]][[1]],n,ego[i],alter[i])}))})
# Note that for maximum likelihood estimation,
# if there is one group and one period,
# the nesting levels for group and period are dropped from ans$sims.
cbind(ego,alter,ones)
matrix(ones,n,n)