rem.dyad {relevent}R Documentation

Fit a Relational Event Model to Dyadic Data

Description

Fits a relational event model to dyadic edgelist data, using either the ordinal or temporal likelihood. Maximum likelihood, posterior mode, and posterior importance resampling methods are supported.

Usage

rem.dyad(edgelist, n, effects = NULL, ordinal = TRUE, acl = NULL,
    cumideg = NULL, cumodeg = NULL, rrl = NULL, covar = NULL, ps = NULL,
    tri = NULL, optim.method = "BFGS", optim.control = list(), 
    coef.seed = NULL, hessian = FALSE, sample.size = Inf, verbose = TRUE, 
    fit.method = c("BPM", "MLE", "BSIR"), conditioned.obs = 0, 
    prior.mean = 0, prior.scale = 100, prior.nu = 4, sir.draws = 500, 
    sir.expand = 10, sir.nu = 4, gof = TRUE)
## S3 method for class 'rem.dyad'
print(x, ...)
## S3 method for class 'rem.dyad'
summary(object, ...) 
## S3 method for class 'rem.dyad'
simulate(object, nsim = object$m, seed = NULL, 
    coef = NULL, covar = NULL, edgelist = NULL, redraw.timing = FALSE,
    redraw.events = FALSE, verbose = FALSE, ...)

Arguments

edgelist

a three-column edgelist matrix, with each row containing (in order) the time/order, sender, and receiver for the event in question, or NULL to create a model skeleton (useful for simulation).

n

number of senders/receivers.

effects

a character vector indicating which effects to use; see below for specification.

ordinal

logical; should the ordinal likelihood be used? (If FALSE, the temporal likelihood is used instead.)

acl

optionally, a pre-computed acl structure.

cumideg

optionally, a pre-computed cumulative indegree structure.

cumodeg

optionally, a pre-computed cumulative outdegree stucture.

rrl

optionally, a pre-computed recency-ranked communications list.

covar

an optional list of sender/receiver/event covariates.

ps

optionally, a pre-computed p-shift matrix.

tri

optionally, a pre-computed triad statistic structure.

optim.method

the method to be used by optim.

optim.control

additional control parameters to optim.

coef.seed

an optional vector of coefficients to use as the starting point for the optimization process; if edgelist==NULL, this is the vector of embedded coefficients for the model skeleton.

hessian

logical; compute the hessian of the log-likelihood/posterior surface?

sample.size

sample size to use when estimating the sum of event rates.

verbose

logical; deliver progress reports?

fit.method

method to use when fitting the model.

conditioned.obs

the number of initial observations on which to condition when fitting the model (defaults to 0).

prior.mean

for Bayesian estimation, location vector for prior distribution (multivariate-t). (Can be a single value.)

prior.scale

for Bayesian estimation, scale vector for prior distribution. (Can be a single value.)

prior.nu

for Bayesian estimation, degrees of freedom for prior distribution. (Setting this to Inf results in a Gaussian prior.)

sir.draws

for sampling importance resampling method, the number of posterior draws to take (post-resampling).

sir.expand

for sampling importance resampling method, the expansion factor to use in the initial (pre-resampling) sample; sample size is sir.expand*sir.draws.

sir.nu

for sampling importance resampling method, the degrees of freedom for the t distribution used to obtain initial (pre-resampling) sample.

gof

logical; calculate goodness-of-fit information?

x

an object of class rem.dyad.

object

an object of class rem.dyad.

nsim

number of events to simulate (defaults to the observed sequence length in the fitted model).

seed

random number seed to use for simulation.

coef

optional vector of coefficients to override those in the fitted model object, for simulation purposes.

redraw.timing

logical; should any prespecified events in edgelist have their timings redrawn during simulation?

redraw.events

logical; should any prespecified events in edgelist have their senders and receivers redrawn during simulation?

...

additional arguments.

Details

rem.dyad fits a (dyadic) relational event model to an event sequence, using either the full temporal or ordinal data likelihoods. Three estimation methods are currently supported: maximum likelihood estimation, Bayesian posterior mode estimation, and Bayesian sampling importance resampling. For the Bayesian methods, an adjustable multivariate-t (or, if prior.nu==Inf, Gaussian) prior is employed. In the case of Bayesian sampling importance resampling, the posterior mode (and the hessian of the posterior about it) is used as the basis for a multivariate-t sample, which is then resampled via SIR methods to obtain an approximate set of posterior draws. While this approximation is not guaranteed to work well, it is generally more robust than pure mode approximations (or, in the case of the MLE, estimates of uncertainty derived from the inverse hessian matrix).

Whether Bayesian or frequentist methods are used, the relevant likelihood is either based entirely on the order of events (ordinal=TRUE) or on the realized event times (ordinal=FALSE). In the latter case, all event times are understood to be relative to the onset of observation (i.e., observation starts at time 0), and the last event time given is taken to be the end of the observation period. (If an event is also specified, this event is ignored.)

Effects to be fit by rem.dyad are determined by the eponymous effects argument, a character vector which lists the effects to be used. These are as follows:

Note that not all effects may lead to identified models in all cases - it is up to the user to ensure that the postulated model makes sense.

Data to be used by rem.dyad must consist of an edgelist matrix, whose rows contain information on successive events. This matrix must have three columns, containing (respectively) the event times, sender IDs (as integers from 1 to n), and receiver IDs (also from 1 to n). As already noted, event times should be relative to onset of observation where the temporal likelihood is being used; otherwise, only event order is employed. In the temporal likelihood case, the last row should contain the time for the termination of the observation period – any event on this row is ignored. If conditioned.obs>0, the relevant number of initial observations is taken as fixed, and the likelihood of the remaining sequence is calculated conditional on these values; this can be useful when analyzing an event history with no clear starting point.

If covariates effects are indicated, then appropriate covariate values must be supplied as a list in argument covar. The elements of covar should be given the same name as the effect type to which they correspond (e.g., CovSnd, CovRec, etc.); any other elements will be ignored. The format of a given covariate element depends both on the effect type and on the number of covariates specified. The basic cases are as follows:

Note that “time varying” covariates may only change values when events transpire; thus, they should be regarded as temporally endogenous. (See the reference below for details.)

If called with edgelist==NULL, rem.dyad will produce a “model skeleton” object containing the effects and other information, but no model fit. (The seed coefficients, if given, are entered as the coefficients in the model, or else an uninteresting default set is used.) The main purpose for this object is to set up an ab initio simulation, as described below: once the skeleton is created, the simulate method can be used to generate draws from that model (without fitting to a data set).

A simulate method is provided for rem.dyad objects, which allows simulation of new event sequences from a fitted or skeleton model. By default, a new sequence of length equal to the original sequence to which the model object was fitted is simulated (if applicable), but other lengths may be chosen using nsim. Although the coefficients in the model object are used by default, this may also be altered by specifying coef. Note that any covariates used must be passed to the simulate command via covar (using the same format as in the original model); this is in part because rem.dyad objects do not currently save their input data, and in part because dynamic covariates must always be the length of the simulated sequence (and hence must be factored when a non-default nsim value is used). For models fit using ordinal=TRUE, the overall pacing of events will be arbitrary (more specifically, the simulation will tacitly assume that each event has a unit base hazard), but the relative timing is not. See below for examples of both simulation using a fitted model object and ab initio simulation without fitting a model to data.

For simulation, it is possible to fix the first portion of the event history by passing an event list matrix to the edgelist argument; this must be compatible with the target model (i.e., the vertex IDs must match), and it cannot contain NA values. (Thus, if starting with an exact timing seqence with a last line containing NAs, this must be removed.) If the input event list contains m events, then these are assumed to supply the first m events of the target sequence; if m>nsim, then any excess events are discarded. By default, the input events are taken as fixed. However, specifying redraw.timing=TRUE will lead the event timings to be redrawn, and redraw.events will lead the sender/reciver pairs to be redrawn. This allows e.g. for an observed ordinal time sequence to be given a simulated exact time realization, by setting nsim to the event list length and setting redraw.timing=TRUE. The more obvious use case is to simply extend an observed sequence, in which case one should use nsim greater than the input sequence length (i.e., the input length plus the number of new events to generate) and leave the redraw paraeters set to FALSE.

Value

For rem.dyad, an object of class rem.dyad. For the simulate method, an event list.

Author(s)

Carter T. Butts buttsc@uci.edu

References

Butts, C.T. (2008). “A Relational Event Framework for Social Action.” Sociological Methodology, 38(1).

See Also

rem

Examples

## Not run: 
#Generate some simple sample data based on fixed effects
roweff<-rnorm(10)                                     #Build rate matrix
roweff<-roweff-roweff[1]                   #Adjust for later convenience
coleff<-rnorm(10)
coleff<-coleff-coleff[1]
lambda<-exp(outer(roweff,coleff,"+"))
diag(lambda)<-0
ratesum<-sum(lambda)
esnd<-as.vector(row(lambda))                  #List of senders/receivers
erec<-as.vector(col(lambda))
time<-0
edgelist<-vector()
while(time<15){                   # Observe the system for 15 time units
  drawsr<-sample(1:100,1,prob=as.vector(lambda))        #Draw from model
  time<-time+rexp(1,ratesum)
  if(time<=15)                                             #Censor at 15
    edgelist<-rbind(edgelist,c(time,esnd[drawsr],erec[drawsr]))
  else
    edgelist<-rbind(edgelist,c(15,NA,NA))
}

#Fit the model, ordinal BPM
effects<-c("FESnd","FERec")
fit.ord<-rem.dyad(edgelist,10,effects=effects,hessian=TRUE)
summary(fit.ord)
par(mfrow=c(1,2))                                #Check the coefficients
plot(roweff[-1],fit.ord$coef[1:9],asp=1)
abline(0,1)
plot(coleff[-1],fit.ord$coef[10:18],asp=1)
abline(0,1)

#Now, find the temporal BPM
fit.time<-rem.dyad(edgelist,10,effects=effects,ordinal=FALSE,hessian=TRUE)
summary(fit.time)
plot(fit.ord$coef,fit.time$coef,asp=1)                  #Similar results
abline(0,1)

#Finally, try the BSIR method (note: a much larger expansion factor
#is recommended in practice)
fit.bsir<-rem.dyad(edgelist,10,effects=effects,fit.method="BSIR",
    sir.draws=100,sir.expand=5)
summary(fit.bsir)
par(mfrow=c(3,3))   #Examine the approximate posterior marginals
for(i in 1:9){
  hist(fit.bsir$post[,i],main=names(fit.bsir$coef)[i],prob=TRUE)
  abline(v=roweff[i+1],col=2,lwd=3)
}
for(i in 10:18){
  hist(fit.bsir$post[,i],main=names(fit.bsir$coef)[i],prob=TRUE)
  abline(v=coleff[i-8],col=2,lwd=3)
}

#Simulate an event sequence from the temporal model
sim<-simulate(fit.time,nsim=50000) #Simulate 50000 events
head(sim)                          #Show the event list
par(mfrow=c(1,2))                  #Check the behavior
esnd<-exp(c(0,fit.time$coef[1:9]))
esnd<-esnd/sum(esnd)*5e4           #Expected sending count
erec<-exp(c(0,fit.time$coef[10:18]))
erec<-erec/sum(erec)*5e4           #Expected sending count
plot(esnd,tabulate(sim[,2]),xlab="Expected Out-events",ylab="Out-events")
abline(0,1,col=2)
plot(erec,tabulate(sim[,3]),xlab="Expected In-events",ylab="In-events")
abline(0,1,col=2)

#Keep the first 10 events of the simulated sequence, and produce 10 more
sim.pre<-sim[1:10,]
sim2<-simulate(fit.time,nsim=20,edgelist=sim.pre)
sim.pre                            #See the first 10 events
sim2                               #First 10 events preserved
all(sim2[1:10,]==sim.pre)          #All TRUE

#Repeat, but redrawing part of the input sequence
sim2.t<-simulate(fit.time,nsim=20,edgelist=sim.pre,redraw.timing=TRUE)
sim2.e<-simulate(fit.time,nsim=20,edgelist=sim.pre,redraw.events=TRUE)
sim2.t                             #Events kept, timings not
sim2.t[1:10,]==sim.pre             #Second two columns TRUE
sim2.e                             #Timing kept, events not
sim2.e[1:10,]==sim.pre             #(Note: some events may repeat by chance!)

## End(Not run)

[Package relevent version 1.2-1 Index]