simData {momentuHMM} | R Documentation |
Simulation tool
Description
Simulates data from a (multivariate) hidden Markov model. Movement data are assumed to be in Cartesian coordinates (not longitude/latitude) and can be generated with or without observation error attributable to temporal irregularity or location measurement error.
Usage
simData(
nbAnimals = 1,
nbStates = 2,
dist,
Par,
beta = NULL,
delta = NULL,
formula = ~1,
formulaDelta = NULL,
mixtures = 1,
formulaPi = NULL,
covs = NULL,
nbCovs = 0,
spatialCovs = NULL,
zeroInflation = NULL,
oneInflation = NULL,
circularAngleMean = NULL,
centers = NULL,
centroids = NULL,
angleCovs = NULL,
obsPerAnimal = c(500, 1500),
initialPosition = c(0, 0),
DM = NULL,
userBounds = NULL,
workBounds = NULL,
betaRef = NULL,
mvnCoords = NULL,
stateNames = NULL,
model = NULL,
states = FALSE,
retrySims = 0,
lambda = NULL,
errorEllipse = NULL,
ncores = 1
)
simHierData(
nbAnimals = 1,
hierStates,
hierDist,
Par,
hierBeta = NULL,
hierDelta = NULL,
hierFormula = NULL,
hierFormulaDelta = NULL,
mixtures = 1,
formulaPi = NULL,
covs = NULL,
nbHierCovs = NULL,
spatialCovs = NULL,
zeroInflation = NULL,
oneInflation = NULL,
circularAngleMean = NULL,
centers = NULL,
centroids = NULL,
angleCovs = NULL,
obsPerLevel,
initialPosition = c(0, 0),
DM = NULL,
userBounds = NULL,
workBounds = NULL,
mvnCoords = NULL,
model = NULL,
states = FALSE,
retrySims = 0,
lambda = NULL,
errorEllipse = NULL,
ncores = 1
)
Arguments
nbAnimals |
Number of observed individuals to simulate. |
nbStates |
Number of behavioural states to simulate. |
dist |
A named list indicating the probability distributions of the data streams. Currently
supported distributions are 'bern', 'beta', 'cat', 'exp', 'gamma', 'lnorm', 'logis', 'negbinom', 'norm', 'mvnorm2' (bivariate normal distribution), 'mvnorm3' (trivariate normal distribution),
'pois', 'rw_norm' (normal random walk), 'rw_mvnorm2' (bivariate normal random walk), 'rw_mvnorm3' (trivariate normal random walk), 'vm', 'vmConsensus', 'weibull', and 'wrpcauchy'. For example,
|
Par |
A named list containing vectors of initial state-dependent probability distribution parameters for
each data stream specified in If |
beta |
Matrix of regression parameters for the transition probabilities (more information in "Details"). |
delta |
Initial value for the initial distribution of the HMM. Default: |
formula |
Regression formula for the transition probability covariates. Default: |
formulaDelta |
Regression formula for the initial distribution. Default: |
mixtures |
Number of mixtures for the state transition probabilities (i.e. discrete random effects *sensu* DeRuiter et al. 2017). Default: |
formulaPi |
Regression formula for the mixture distribution probabilities. Default: |
covs |
Covariate values to include in the simulated data, as a dataframe. The names of any covariates specified by |
nbCovs |
Number of covariates to simulate (0 by default). Does not need to be specified if
|
spatialCovs |
List of |
zeroInflation |
A named list of logicals indicating whether the probability distributions of the data streams should be zero-inflated. If |
oneInflation |
A named list of logicals indicating whether the probability distributions of the data streams should be one-inflated. If |
circularAngleMean |
An optional named list indicating whether to use circular-linear (FALSE) or circular-circular (TRUE)
regression on the mean of circular distributions ('vm' and 'wrpcauchy') for turning angles. For example,
Alternatively, |
centers |
2-column matrix providing the x-coordinates (column 1) and y-coordinates (column 2) for any activity centers (e.g., potential
centers of attraction or repulsion) from which distance and angle covariates will be calculated based on the simulated location data. These distance and angle
covariates can be included in |
centroids |
List where each element is a data frame consisting of at least |
angleCovs |
Character vector indicating the names of any circular-circular regression angular covariates in |
obsPerAnimal |
Either the number of observations per animal (if single value) or the bounds of the number of observations per animal (if vector of two values). In the latter case,
the numbers of obervations generated for each animal are uniformously picked from this interval. Alternatively, |
initialPosition |
2-vector providing the x- and y-coordinates of the initial position for all animals. Alternatively, |
DM |
An optional named list indicating the design matrices to be used for the probability distribution parameters of each data
stream. Each element of Design matrices specified using formulas allow standard functions in R formulas
(e.g., |
userBounds |
An optional named list of 2-column matrices specifying bounds on the natural (i.e, real) scale of the probability
distribution parameters for each data stream. For example, for a 2-state model using the wrapped Cauchy ('wrpcauchy') distribution for
a data stream named 'angle' with |
workBounds |
An optional named list of 2-column matrices specifying bounds on the working scale of the probability distribution, transition probability, and initial distribution parameters. For each matrix, the first column pertains to the lower bound and the second column the upper bound.
For data streams, each element of |
betaRef |
Numeric vector of length |
mvnCoords |
Character string indicating the name of location data that are to be simulated using a multivariate normal distribution. For example, if |
stateNames |
Optional character vector of length nbStates indicating state names. |
model |
A |
states |
|
retrySims |
Number of times to attempt to simulate data within the spatial extent of |
lambda |
Observation rate for location data. If |
errorEllipse |
List providing the upper bound for the semi-major axis ( |
ncores |
Number of cores to use for parallel processing. Default: 1 (no parallel processing). |
hierStates |
A hierarchical model structure |
hierDist |
A hierarchical data structure |
hierBeta |
A hierarchical data structure |
hierDelta |
A hierarchical data structure |
hierFormula |
A hierarchical formula structure for the transition probability covariates for each level of the hierarchy ('formula'). Default: |
hierFormulaDelta |
A hierarchical formula structure for the initial distribution covariates for each level of the hierarchy ('formulaDelta'). Default: |
nbHierCovs |
A hierarchical data structure |
obsPerLevel |
A hierarchical data structure |
Details
-
simHierData
is very similar tosimData
except that instead of simply specifying the number of states (nbStates
), distributions (dist
), observations (obsPerAnimal
), covariates (nbCovs
), and a single t.p.m. formula (formula
), thehierStates
argument specifies the hierarchical nature of the states, thehierDist
argument specifies the hierarchical nature of the data streams, theobsPerLevel
argument specifies the number of observations for each level of the hierarchy, thenbHierCovs
argument specifies the number of covariates for each level of the hierarchy, and thehierFormula
argument specifies a t.p.m. formula for each level of the hierarchy. All of the hierarhcial arguments insimHierData
are specified asNode
objects from thedata.tree
package. x- and y-coordinate location data are generated only if valid 'step' and 'angle' data streams are specified. Vaild distributions for 'step' include 'gamma', 'weibull', 'exp', and 'lnorm'. Valid distributions for 'angle' include 'vm' and 'wrpcauchy'. If only a valid 'step' data stream is specified, then only x-coordinates are generated.
If
DM
is specified for a particular data stream, then the initial values are specified on the working (i.e., beta) scale of the parameters. The working scale of each parameter is determined by the link function used. The functiongetParDM
is intended to help with obtaining initial values on the working scale when specifying a design matrix and other parameter constraints.Simulated data that are temporally regular (i.e.,
lambda=NULL
) and without location measurement error (i.e.,errorEllipse=NULL
) are returned as amomentuHMMData
(ormomentuHierHMMData
) object suitable for analysis usingfitHMM
.Simulated location data that are temporally-irregular (i.e.,
lambda>0
) and/or with location measurement error (i.e.,errorEllipse!=NULL
) are returned as a data frame suitable for analysis usingcrawlWrap
.The matrix
beta
of regression coefficients for the transition probabilities has one row for the intercept, plus one row for each covariate, and one column for each non-diagonal element of the transition probability matrix. For example, in a 3-state HMM with 2formula
covariates, the matrixbeta
has three rows (intercept + two covariates) and six columns (six non-diagonal elements in the 3x3 transition probability matrix - filled in row-wise). In a covariate-free model (default),beta
has one row, for the intercept.State-specific formulas can be specified in
DM
using special formula functions. These special functions can take the namespaste0("state",1:nbStates)
(where the integer indicates the state-specific formula). For example,DM=list(step=list(mean=~cov1+state1(cov2),sd=~cov2+state2(cov1)))
includescov1
on the mean parameter for all states,cov2
on the mean parameter for state 1,cov2
on the sd parameter for all states, andcov1
on the sd parameter for state 2.State- and parameter-specific formulas can be specified for transition probabilities in
formula
using special formula functions. These special functions can take the namespaste0("state",1:nbStates)
(where the integer indicates the current state from which transitions occur),paste0("toState",1:nbStates)
(where the integer indicates the state to which transitions occur), orpaste0("betaCol",nbStates*(nbStates-1))
(where the integer indicates the column of thebeta
matrix). For example withnbStates=3
,formula=~cov1+betaCol1(cov2)+state3(cov3)+toState1(cov4)
includescov1
on all transition probability parameters,cov2
on thebeta
column corresponding to the transition from state 1->2,cov3
on transition probabilities from state 3 (i.e.,beta
columns corresponding to state transitions 3->1 and 3->2), andcov4
on transition probabilities to state 1 (i.e.,beta
columns corresponding to state transitions 2->1 and 3->1).Cyclical relationships (e.g., hourly, monthly) may be simulated using the
consinor(x,period)
special formula function for covariatex
and sine curve period of time lengthperiod
. For example, if the data are hourly, a 24-hour cycle can be simulated using~cosinor(cov1,24)
, where the covariatecov1
is a repeating series of integers0,1,...,23,0,1,...,23,0,1,...
(note thatsimData
will not do this for you, the appropriate covariate must be specified using thecovs
argument; see example below). Thecosinor(x,period)
function convertsx
to 2 covariatescosinorCos(x)=cos(2*pi*x/period)
andconsinorSin(x)=sin(2*pi*x/period
for inclusion in the model (i.e., 2 additional parameters per state). The amplitude of the sine wave is thussqrt(B_cos^2 + B_sin^2)
, whereB_cos
andB_sin
are the working parameters correponding tocosinorCos(x)
andcosinorSin(x)
, respectively (e.g., see Cornelissen 2014).When the circular-circular regression model is used, the special function
angleFormula(cov,strength,by)
can be used inDM
for the mean of angular distributions (i.e. 'vm', 'vmConsensus', and 'wrpcauchy'), wherecov
is an angle covariate (e.g. wind direction),strength
is a positive real covariate (e.g. wind speed), andby
is an optional factor variable for individual- or group-level effects (e.g. ID, sex). This allows angle covariates to be weighted based on their strength or importance at time step t as in Rivest et al. (2016).
If the length of covariate values passed (either through 'covs', or 'model') is not the same as the number of observations suggested by 'nbAnimals' and 'obsPerAnimal' (or 'obsPerLevel' for
simHierData
), then the series of covariates is either shortened (removing last values - if too long) or extended (starting over from the first values - if too short).For
simData
, when covariates are not included informulaDelta
(i.e.formulaDelta=NULL
), thendelta
is specified as a vector of lengthnbStates
that sums to 1. When covariates are included informulaDelta
, thendelta
must be specified as a k x (nbStates
-1) matrix of working parameters, where k is the number of regression coefficients and the columns correspond to states 2:nbStates
. For example, in a 3-state HMM withformulaDelta=~cov1+cov2
, the matrixdelta
has three rows (intercept + two covariates) and 2 columns (corresponding to states 2 and 3). The initial distribution working parameters are transformed to the real scale asexp(covsDelta*Delta)/rowSums(exp(covsDelta*Delta))
, wherecovsDelta
is the N x k design matrix,Delta=cbind(rep(0,k),delta)
is a k xnbStates
matrix of working parameters, andN=length(unique(data$ID))
.For
simHierData
,delta
must be specified as a k x (nbStates
-1) matrix of working parameters, where k is the number of regression coefficients and the columns correspond to states 2:nbStates
.
Value
If the simulated data are temporally regular (i.e., lambda=NULL
) with no measurement error (i.e., errorEllipse=NULL
), an object momentuHMMData
(or momentuHierHMMData
),
i.e., a dataframe of:
ID |
The ID(s) of the observed animal(s) |
... |
Data streams as specified by |
x |
Either easting or longitude (if data streams include valid non-negative distribution for 'step') |
y |
Either norting or latitude (if data streams include valid non-negative distribution for 'step') |
... |
Covariates (if any) |
If simulated location data are temporally irregular (i.e., lambda>0
) and/or include measurement error (i.e., errorEllipse!=NULL
), a dataframe of:
time |
Numeric time of each observed (and missing) observation |
ID |
The ID(s) of the observed animal(s) |
x |
Either easting or longitude observed location |
y |
Either norting or latitude observed location |
... |
Data streams that are not derived from location (if applicable) |
... |
Covariates at temporally-regular true ( |
mux |
Either easting or longitude true location |
muy |
Either norting or latitude true location |
error_semimajor_axis |
error ellipse semi-major axis (if applicable) |
error_semiminor_axis |
error ellipse semi-minor axis (if applicable) |
error_ellipse_orientation |
error ellipse orientation (if applicable) |
ln.sd.x |
log of the square root of the x-variance of bivariate normal error (if applicable; required for error ellipse models in |
ln.sd.y |
log of the square root of the y-variance of bivariate normal error (if applicable; required for error ellipse models in |
error.corr |
correlation term of bivariate normal error (if applicable; required for error ellipse models in |
References
Cornelissen, G. 2014. Cosinor-based rhythmometry. Theoretical Biology and Medical Modelling 11:16.
McClintock BT, London JM, Cameron MF, Boveng PL. 2015. Modelling animal movement using the Argos satellite telemetry location error ellipse. Methods in Ecology and Evolution 6(3):266-277.
Rivest, LP, Duchesne, T, Nicosia, A, Fortin, D, 2016. A general angular regression model for the analysis of data on animal movement in ecology. Journal of the Royal Statistical Society: Series C (Applied Statistics), 65(3):445-463.
Leos-Barajas, V., Gangloff, E.J., Adam, T., Langrock, R., van Beest, F.M., Nabe-Nielsen, J. and Morales, J.M. 2017. Multi-scale modeling of animal movement and general behavior data using hidden Markov models with hierarchical structures. Journal of Agricultural, Biological and Environmental Statistics, 22 (3), 232-248.
See Also
Examples
# 1. Pass a fitted model to simulate from
# (m is a momentuHMM object - as returned by fitHMM - automatically loaded with the package)
# We keep the default nbAnimals=1.
m <- example$m
obsPerAnimal=c(50,100)
data <- simData(model=m,obsPerAnimal=obsPerAnimal)
## Not run:
# 2. Pass the parameters of the model to simulate from
stepPar <- c(1,10,1,5,0.2,0.3) # mean_1, mean_2, sd_1, sd_2, zeromass_1, zeromass_2
anglePar <- c(pi,0,0.5,2) # mean_1, mean_2, concentration_1, concentration_2
omegaPar <- c(1,10,10,1) # shape1_1, shape1_2, shape2_1, shape2_2
stepDist <- "gamma"
angleDist <- "vm"
omegaDist <- "beta"
data <- simData(nbAnimals=4,nbStates=2,dist=list(step=stepDist,angle=angleDist,omega=omegaDist),
Par=list(step=stepPar,angle=anglePar,omega=omegaPar),nbCovs=2,
zeroInflation=list(step=TRUE),
obsPerAnimal=obsPerAnimal)
# 3. Include covariates
# (note that it is useless to specify "nbCovs", which are overruled
# by the number of columns of "cov")
cov <- data.frame(temp=log(rnorm(500,20,5)))
stepPar <- c(log(10),0.1,log(100),-0.1,log(5),log(25)) # working scale parameters for step DM
anglePar <- c(pi,0,0.5,2) # mean_1, mean_2, concentration_1, concentration_2
stepDist <- "gamma"
angleDist <- "vm"
data <- simData(nbAnimals=2,nbStates=2,dist=list(step=stepDist,angle=angleDist),
Par=list(step=stepPar,angle=anglePar),
DM=list(step=list(mean=~temp,sd=~1)),
covs=cov,
obsPerAnimal=obsPerAnimal)
# 4. Include example 'forest' spatial covariate raster layer
# nbAnimals and obsPerAnimal kept small to reduce example run time
spatialCov<-list(forest=forest)
data <- simData(nbAnimals=1,nbStates=2,dist=list(step=stepDist,angle=angleDist),
Par=list(step=c(100,1000,50,100),angle=c(0,0,0.1,5)),
beta=matrix(c(5,-10,-25,50),nrow=2,ncol=2,byrow=TRUE),
formula=~forest,spatialCovs=spatialCov,
obsPerAnimal=250,states=TRUE,
retrySims=100)
# 5. Specify design matrix for 'omega' data stream
# natural scale parameters for step and angle
stepPar <- c(1,10,1,5) # shape_1, shape_2, scale_1, scale_2
anglePar <- c(pi,0,0.5,0.7) # mean_1, mean_2, concentration_1, concentration_2
# working scale parameters for omega DM
omegaPar <- c(log(1),0.1,log(10),-0.1,log(10),-0.1,log(1),0.1)
stepDist <- "weibull"
angleDist <- "wrpcauchy"
omegaDist <- "beta"
data <- simData(nbStates=2,dist=list(step=stepDist,angle=angleDist,omega=omegaDist),
Par=list(step=stepPar,angle=anglePar,omega=omegaPar),nbCovs=2,
DM=list(omega=list(shape1=~cov1,shape2=~cov2)),
obsPerAnimal=obsPerAnimal,states=TRUE)
# 6. Include temporal irregularity and location measurement error
lambda <- 2 # expect 2 observations per time step
errorEllipse <- list(M=50,m=25,r=180)
obsData <- simData(model=m,obsPerAnimal=obsPerAnimal,
lambda=lambda, errorEllipse=errorEllipse)
# 7. Cosinor and state-dependent formulas
nbStates<-2
dist<-list(step="gamma")
Par<-list(step=c(100,1000,50,100))
# include 24-hour cycle on all transition probabilities
# include 12-hour cycle on transitions from state 2
formula=~cosinor(hour24,24)+state2(cosinor(hour12,12))
# specify appropriate covariates
covs<-data.frame(hour24=0:23,hour12=0:11)
beta<-matrix(c(-1.5,1,1,NA,NA,-1.5,-1,-1,1,1),5,2)
# row names for beta not required but can be helpful
rownames(beta)<-c("(Intercept)",
"cosinorCos(hour24, 24)",
"cosinorSin(hour24, 24)",
"cosinorCos(hour12, 12)",
"cosinorSin(hour12, 12)")
data.cos<-simData(nbStates=nbStates,dist=dist,Par=Par,
beta=beta,formula=formula,covs=covs)
# 8. Piecewise constant B-spline on step length mean and angle concentration
nObs <- 1000 # length of simulated track
cov <- data.frame(time=1:nObs) # time covariate for splines
dist <- list(step="gamma",angle="vm")
stepDM <- list(mean=~splines2::bSpline(time,df=2,degree=0),sd=~1)
angleDM <- list(mean=~1,concentration=~splines2::bSpline(time,df=2,degree=0))
DM <- list(step=stepDM,angle=angleDM)
Par <- list(step=c(log(1000),1,-1,log(100)),angle=c(0,log(10),2,-5))
data.spline<-simData(obsPerAnimal=nObs,nbStates=1,dist=dist,Par=Par,DM=DM,covs=cov)
# 9. Initial state (delta) based on covariate
nObs <- 100
dist <- list(step="gamma",angle="vm")
Par <- list(step=c(100,1000,50,100),angle=c(0,0,0.01,0.75))
# create sex covariate
cov <- data.frame(sex=factor(rep(c("F","M"),each=nObs))) # sex covariate
formulaDelta <- ~ sex + 0
# Female begins in state 1, male begins in state 2
delta <- matrix(c(-100,100),2,1,dimnames=list(c("sexF","sexM"),"state 2"))
data.delta<-simData(nbAnimals=2,obsPerAnimal=nObs,nbStates=2,dist=dist,Par=Par,
delta=delta,formulaDelta=formulaDelta,covs=cov,
beta=matrix(-1.5,1,2),states=TRUE)
## End(Not run)