R: Function to estimate parameters in a Siena model

siena07 {RSiena}

R Documentation

Function to estimate parameters in a Siena model

Description

Estimates parameters in a Siena model using Method of Moments, based on direct simulation, conditional or otherwise; or using Generalized Method of Moments; or using Maximum Likelihood by MCMC simulation. Estimation is done using a Robbins-Monro algorithm. Note that the data and particular model to be used must be passed in using named arguments as the ..., and the specification for the algorithm must be passed on as x, which is a sienaAlgorithm object as produced by sienaAlgorithmCreate (see examples).

Usage

siena07(x, batch=FALSE, verbose=FALSE, silent=FALSE,
        useCluster=FALSE, nbrNodes=2,
        thetaValues = NULL,
        returnThetas = FALSE,
        targets = NULL,
        initC=TRUE,
        clusterString=rep("localhost", nbrNodes), tt=NULL,
        parallelTesting=FALSE, clusterIter=!x$maxlike,
        clusterType=c("PSOCK", "FORK"), cl=NULL, ...)

Arguments

`x`	A control object, of class `sienaAlgorithm.`
`batch`	Desired interface: `FALSE` gives a gui (graphical user interface implemented as a tcl/tk screen), `TRUE` gives a small (if `verbose=FALSE`) amount of printout to the console.
`verbose`	Produces various output to the console if `TRUE`.
`silent`	Produces no output to the console if `TRUE`, even if batch mode.
`useCluster`	Boolean: whether to use a cluster of processes (useful if multiple processors are available).
`nbrNodes`	Number of processes to use if useCluster is `TRUE`.
`thetaValues`	If not `NULL`, this should be a matrix with parameter values to be used in Phase 3. The number of columns must be equal to the number of estimated parameters in the effects object (if conditional estimation is used, without the rate parameters for the conditioning dependent variable). Can only be used if `x$simOnly=TRUE`.
`returnThetas`	Boolean: whether to return theta values and generated estimation statistics of Phase 2 runs.
`targets`	Numeric vector of length equal to the number of estimated parameters, meant to supersede the targets calculated from the data set; see "Details". Not for regular use.
`initC`	Boolean: set to `TRUE` if the simulation will use C routines (currently always needed). Only for use if using multiple processors, to ensure all copies are initialised correctly. Ignored otherwise, so is set to `TRUE` by default.
`clusterString`	Definitions of clusters. Default set up to use the local machine only.
`tt`	A `tcltk` toplevel window. Used if called from the model options screen, if `tcltk` is available.
`parallelTesting`	Boolean. If `TRUE`, sets up random numbers to parallel those in Siena 3.
`clusterIter`	Boolean. If `TRUE`, multiple processes execute complete iterations at each call. If `FALSE`, multiple processes execute a single wave at each call.
`clusterType`	Either "PSOCK" or "FORK". On Windows, must be "PSOCK". On a single non-Windows machine may be "FORK", and subprocesses will be formed by forking. If "PSOCK", subprocesses are formed using R scripts.
`cl`	An object of class c("SOCKcluster", "cluster") (see Details).
`...`	Arguments for the simulation function, see `simstats0c`: in any case, `data` and `effects`, as in the examples below; possibly also `prevAns` if a previous reasonable provisional estimate was obtained for a similar model; possibly also `returnDeps` if the simulated dependent variables (networks, behaviour) should be returned; possibly also `returnChains` if the simulated sequences (chains) of ministeps should be returned; this may produce a very big file.

Details

This is the main function and workhorse of RSiena.

For use of siena07, it is necessary to specify parameters data (RSiena data set) and effects (effects object), which are required parameters in function simstats0c. (These parameters are inserted through '...'.) See the examples.

siena07 runs a Robbins-Monro algorithm for parameter estimation using the three-phase implementation described in Snijders (2001, 2017), with (if x$findiff=FALSE) derivative estimation as in Schweinberger and Snijders (2007). The default is estimation according to the Method of Moments as in Snijders, Steglich and Schweinberger (2007).
If x$gmm=TRUE and myeff contains one or more gmm statistics as included by function includeGMoMStatistics, the algorithm employs the Generalized Method of Moments as defined in Amati, Schoenenberger, and Snijders (2015, 2019).
For continuous behavior variables defined with type="continuous" in sienaDependent, estimation is done as described in Niezink and Snijders (2017).
If x$maxlike=TRUE, estimation is done by Maximum Likelihood implemented as in Snijders, Koskinen and Schweinberger (2010).
Phase 1 does a few iterations to estimate the derivative matrix of the targets with respect to the parameter vector. Phase 2 does the estimation. Phase 3 runs a simulation to estimate standard errors and check convergence of the model. The simulation function is called once for each iteration in these phases and also once to initialise the model fitting and once to complete it. Unless in batch mode, a tcl/tk screen is displayed to allow interruption and to show progress.

If targets is specified (which should be done only in special cases), and provided that estimation is by the Method of Moments, the data is not a multi-group data set and has exactly 2 waves, and if the length of the vector targets is equal to the number of estimated parameters (not counting the rate parameters estimated by conditional estimation), then the vector targets supersedes the targets calculated from the data set.

It is necessary to check that convergence has been achieved. The rule of thumb is that the all t-ratios for convergence should be in absolute value less than 0.1 and the overall maximum convergence ratio should be less than 0.25. If this was not achieved, the result can be used to start another estimation run from the estimate obtained, using the parameter prevAns as illustrated in the example below. (This parameter is inserted through '...' into the function initializeFRAN.)

For good estimation of standard errors, it is necessary that x$n3 is large enough. More about this is in the manual. The default value x$n3 set in sienaAlgorithmCreate is adequate for most explorative use, but for presentation in publications larger values are necessary, depending on the data set and model; e.g., x$n3=3000 or larger.

Parameters can be tested against zero by dividing the estimate by its standard error and using an approximate standard normal null distribution. Further, functions Wald.RSiena and Multipar.RSiena are available for multi-parameter testing.
Parameters specified in includeEffects or setEffect with fix=TRUE, test=TRUE will not be estimated; score tests of their hypothesized values are reported in the output file specified in the control (algorithm) object. These tests can be obtained also using score.Test.

If x$simOnly is TRUE, which is meant to go together with x$nsub=0, the calculation of the standard errors and covariance matrix at the end of Pase 3 is skipped. No estimation is performed. If thetaValues is not NULL, the parameter values in the rows of this matrix will be used in the consecutive runs of Phase 3. If x$n3 is larger than the number of rows times nbrNodes (see below), the last row of thetaValues will continue to be used. The parameter values actually used will be stored in the output matrix thetaUsed.

Multiple processors are used for estimation by MoM to distribute each iteration in each subphase over the cluster of nodes. The number of iterations accordingly will be divided (approximately) by the number of nodes; for phase 2, unless n2start is specified. This implies that if multiple processors are used, think of dividing n2start by nbrNodes.
For estimation by ML, multiple processing is done per period. Therefore, for one period (two waves) and one group, this will have no effect.

In the case of using multiple processors, there are two options for telling siena07 to use them. By specifying the options useCluster, nbrNodes, clusterString and initC, siena07 will create a cluster object that will be used by the parallel package. After finishing the estimation procedure, siena07 will automatically stop the cluster. Alternatively, instead of having the function to create a cluster, the user may provide its own by specifying the option cl, similar to what the boot function does in the boot package. By using the option cl the user may be able to create more complex clusters (see examples below).

If thetaValues is not NULL and nbrNodes >= 2, parameters in Phase 3 will be constant for each set of nbrNodes consecutive simulations. This must be noted in the interpretation, and will be visible in thetaUsed (see below).

Value

Returns an object of class sienaFit, some parts of which are:

`OK`	Boolean indicating successful termination
`termination`	Character string, values: "OK", "Error", or "UserInterrupt". "UserInterrupt" indicates that the user asked for early termination before phase 3.
`f`	Various characteristics of the data and model definition.
`requestedEffects`	The included effects in the effects object.
`effects`	The included effects in the effects object to which are added the main effects of the requested interaction effects, if any.
`theta`	Estimated value of theta, if `x$simOnly=FALSE`.
`thetas`	Matrix, returned if `returnThetas` and `x$nsub >= 1`. First column is subphase; further columns are values of theta as generated during this subphase of Phase 2.
`sfs`	Matrix, returned if `returnThetas` and `x$nsub >= 1`. First column is subphase; further columns are deviations from targets generated during this subphase of Phase 2.
`covtheta`	Estimated covariance matrix of theta; this is not available if `x$simOnly=TRUE`.
`se`	Vector of standard errors of estimated theta, if `x$simOnly=FALSE`.
`dfra`	Matrix of estimated derivatives.
`sf`	Matrix of simulated deviations from targets in phase 3.
`sf2`	Array of periodwise deviations from simulations in phase 3. Not included if `x$lessMem=TRUE`.
`tconv`	t-statistics for convergence.
`tmax`	maximum absolute t-statistic for convergence for non-fixed parameters.
`tconv.max`	overall maximum convergence ratio.
`ac3`	If `x$maxlike=TRUE`: autocorrelations of statistics in Phase 3.
`targets`	Observed statistics; for ML, zero vector.
`targets2`	Observed statistics by wave, starting with second wave; for ML, zero matrix.
`ssc`	Score function contributions for each wave for each simulation in phase 3. Not included if finite difference method is used or if `x$lessMem=TRUE`.
`scores`	Score functions, added over waves, for each simulation in phase 3. Only included if `x$lessMem=FALSE`.
`regrCoef`	If `x$dolby` and not `x$maxlike`: regression coefficients of estimation statistics on score functions.
`regrCor`	If `x$dolby` and not `x$maxlike`: correlations between estimation statistics and score functions.
`estMeans`	Estimated means of estimation statistics.
`estMeans.sem`	If `x$simOnly`: Standard errors of the estimated means of estimation statistics.
`sims`	If `returnDeps=TRUE`: list of simulated dependent variables (networks, behaviour). Networks are given as a list of edgelists, one for each period. The structure of sims is a nested list: `sims[[run]][[group]][[dependent variable]][[period]]`. If `x$maxlike=TRUE` and there is only one group and one period, the structure is `[[run]][[dependent variable]]`.
`chain`	If `returnChains = TRUE`: list, or data frame, of simulated chains of ministeps. The chain has the structure `chain[[run]][[depvar]][[period]][[ministep]]`.
`Phase3nits`	Number of iterations actually performed in phase 3.
`thetaUsed`	If `thetaValues` is not `NULL`, the matrix of parameter values actually used in the simulations of Phase 3.

Writes text output to the file named "projname.txt", where projname is defined in the sienaAlgorithm object x.

Author(s)

Ruth Ripley, Tom Snijders, Viviana Amati, Felix Schoenenberger, Nynke Niezink

References

Amati, V., Schoenenberger, F., and Snijders, T.A.B. (2015), Estimation of stochastic actor-oriented models for the evolution of networks by generalized method of moments. Journal de la Societe Francaise de Statistique 156, 140–165.

Amati, V., Schoenenberger, F., and Snijders, T.A.B. (2019), Contemporaneous statistics for estimation in stochastic actor-oriented co-evolution models. Psychometrika 84, 1068–1096.

Greenan, C. (2015), Evolving Social Network Analysis: developments in statistical methodology for dynamic stochastic actor-oriented models. DPhil dissertation, University of Oxford.

Niezink, N.M.D., and Snijders, T.A.B. (2017), Co-evolution of Social Networks and Continuous Actor Attributes. The Annals of Applied Statistics 11, 1948–1973.

Schweinberger, M., and Snijders, T.A.B. (2007), Markov models for digraph panel data: Monte Carlo based derivative estimation. Computational Statistics and Data Analysis 51, 4465–4483.

Snijders, T.A.B. (2001), The statistical evaluation of social network dynamics. Sociological Methodology 31, 361–395.

Snijders, T.A.B. (2017), Stochastic Actor-Oriented Models for Network Dynamics. Annual Review of Statistics and Its Application 4, 343–363.

Snijders, T.A.B., Koskinen, J., and Schweinberger, M. (2010). Maximum likelihood estimation for social network dynamics. Annals of Applied Statistics 4, 567–588.

Snijders, T.A.B., Steglich, C.E.G., and Schweinberger, Michael (2007), Modeling the co-evolution of networks and behavior. Pp. 41–71 in Longitudinal models in the behavioral and related sciences, edited by van Montfort, K., Oud, H., and Satorra, A.; Lawrence Erlbaum.

Steglich, C.E.G., Snijders, T.A.B., and Pearson, M.A. (2010), Dynamic networks and behavior: Separating selection from influence. Sociological Methodology 40, 329–393. Information about the implementation of the algorithm is in https://www.stats.ox.ac.uk/~snijders/siena/Siena_algorithms.pdf. Further see https://www.stats.ox.ac.uk/~snijders/siena/ .

Examples

myalgorithm <- sienaAlgorithmCreate(nsub=2, n3=100, seed=1293)
# nsub=2, n3=100 is used here for having a brief computation, not for practice.
mynet1 <- sienaDependent(array(c(tmp3, tmp4), dim=c(32, 32, 2)))
mydata <- sienaDataCreate(mynet1)
myeff <- getEffects(mydata)
ans <- siena07(myalgorithm, data=mydata, effects=myeff, batch=TRUE)

# or for non-conditional estimation --------------------------------------------
## Not run: 
model <- sienaAlgorithmCreate(nsub=2, n3=100, cond=FALSE, seed=1283)
ans <- siena07(myalgorithm, data=mydata, effects=myeff, batch=TRUE)
        
## End(Not run)

# or if a previous "on track" result ans was obtained --------------------------
## Not run: 
ans1 <- siena07(myalgorithm, data=mydata, effects=myeff, prevAns=ans)
         
## End(Not run)

# Running in multiple processors -----------------------------------------------
## Not run: 
# Not tested because dependent on presence of processors
# Find out how many processors there are
library(parallel)
(n.clus <- detectCores() - 1)
n.clus <- min(n.clus, 4)  # keep time for other processes
ans2 <- siena07(myalgorithm, data=mydata, effects=myeff,
                useCluster=TRUE, nbrNodes=n.clus, initC=TRUE)

# Suppose 8 processors are going to be used.
# Loading the parallel package and creating a cluster
# with 8 processors (this should be equivalent)

library(parallel)
cl <- makeCluster(n.clus)

ans3 <- siena07(myalgorithm, data=mydata, effects=myeff, batch=TRUE, cl = cl)

# Notice that now -siena07- perhaps won't stop the cluster for you.
# stopCluster(cl)

# You can create even more complex clusters using several computers. In this
# example we are creating a cluster with 3*8 = 24 processors on three
# different machines.
#cl <- makePSOCKcluster(
#    rep(c('localhost', 'machine2.website.com' , 'machine3.website.com'), 8),
#    user='myusername', rshcmd='ssh -p PORTNUMBER')

#ans4 <- siena07(myalgorithm, data=mydata, effects=myeff, batch=TRUE, cl = cl)
#stopCluster(cl)

## End(Not run)

# for a continuous behavior variable -------------------------------------------
# simulate behavior data according to dZ(t) = [-0.1 Z + 1] dt + 1 dW(t)
set.seed(123)
y1 <- rnorm(50, 0,3)
y2 <- exp(-0.1) * y1 + (1-exp(-0.1)) * 1/ -0.1 + rnorm(50, 0, (exp(-0.2)- 1) / -0.2 * 1^2)
friend <- sienaDependent(array(c(s501, s502), dim = c(50,50,2)))
behavior <- sienaDependent(matrix(c(y1,y2), 50,2), type = "continuous")
(mydata <- sienaDataCreate(friend, behavior))
(myeff <- getEffects(mydata, onePeriodSde = TRUE))
algorithmMoM <- sienaAlgorithmCreate(nsub=1, n3=20, seed=321)
(ans <- siena07(myalgorithm, data = mydata, effects = myeff, batch=TRUE))

# Accessing simulated networks for ML ------------------------------------------
# The following is an example for accessing the simulated networks for ML,
# which makes sense only if there are some missing tie variables;
# observed tie variables are identically simulated
# at the moment of observation,
# missing tie variable are imputed in a model-based way.
mat1 <- matrix(c(0,0,1,1,
                 1,0,0,0,
                 0,0,0,1,
                 0,1,0,0),4,4, byrow=TRUE)
mat2 <- matrix(c(0,1,1,1,
                 1,0,0,0,
                 0,0,0,1,
                 0,0,1,0),4,4, byrow=TRUE)
mat3 <- matrix(c(0,1,0,1,
                 1,0,0,0,
                 0,0,0,0,
                 NA,1,1,0),4,4, byrow=TRUE)
mats <- array(c(mat1,mat2,mat3), dim=c(4,4,3))
net <- sienaDependent(mats, allowOnly=FALSE)
sdat <- sienaDataCreate(net)
alg <- sienaAlgorithmCreate(maxlike=TRUE, nsub=3, n3=100, seed=12534)
effs <- getEffects(sdat)
(ans <- siena07(alg, data=sdat, effects=effs, returnDeps=TRUE, batch=TRUE))
# See manual Section 9.1 for information about the following functions
edges.to.adj <- function(x,n){
# create empty adjacency matrix
    adj <- matrix(0, n, n)
# put edge values in desired places
    adj[x[, 1:2]] <- x[, 3]
    adj
}
the.edge <- function(x,n,h,k){
    edges.to.adj(x,n)[h,k]
}
# Now show the results
n <- 4
ego <- rep.int(1:n,n)
alter <- rep(1:n, each=n)
# Get the average simulated adjacency matrices for wave 3 (period 2):
ones <- sapply(1:n^2, function(i)
    {mean(sapply(ans$sims,
           function(x){the.edge(x[[1]][[2]][[1]],n,ego[i],alter[i])}))})
# Note that for maximum likelihood estimation,
# if there is one group and one period,
# the nesting levels for group and period are dropped from ans$sims.
cbind(ego,alter,ones)
matrix(ones,n,n)

[Package RSiena version 1.4.7 Index]