| simData {momentuHMM} | R Documentation |
Simulation tool
Description
Simulates data from a (multivariate) hidden Markov model. Movement data are assumed to be in Cartesian coordinates (not longitude/latitude) and can be generated with or without observation error attributable to temporal irregularity or location measurement error.
Usage
simData(
nbAnimals = 1,
nbStates = 2,
dist,
Par,
beta = NULL,
delta = NULL,
formula = ~1,
formulaDelta = NULL,
mixtures = 1,
formulaPi = NULL,
covs = NULL,
nbCovs = 0,
spatialCovs = NULL,
zeroInflation = NULL,
oneInflation = NULL,
circularAngleMean = NULL,
centers = NULL,
centroids = NULL,
angleCovs = NULL,
obsPerAnimal = c(500, 1500),
initialPosition = c(0, 0),
DM = NULL,
userBounds = NULL,
workBounds = NULL,
betaRef = NULL,
mvnCoords = NULL,
stateNames = NULL,
model = NULL,
states = FALSE,
retrySims = 0,
lambda = NULL,
errorEllipse = NULL,
ncores = 1
)
simHierData(
nbAnimals = 1,
hierStates,
hierDist,
Par,
hierBeta = NULL,
hierDelta = NULL,
hierFormula = NULL,
hierFormulaDelta = NULL,
mixtures = 1,
formulaPi = NULL,
covs = NULL,
nbHierCovs = NULL,
spatialCovs = NULL,
zeroInflation = NULL,
oneInflation = NULL,
circularAngleMean = NULL,
centers = NULL,
centroids = NULL,
angleCovs = NULL,
obsPerLevel,
initialPosition = c(0, 0),
DM = NULL,
userBounds = NULL,
workBounds = NULL,
mvnCoords = NULL,
model = NULL,
states = FALSE,
retrySims = 0,
lambda = NULL,
errorEllipse = NULL,
ncores = 1
)
Arguments
nbAnimals |
Number of observed individuals to simulate. |
nbStates |
Number of behavioural states to simulate. |
dist |
A named list indicating the probability distributions of the data streams. Currently
supported distributions are 'bern', 'beta', 'cat', 'exp', 'gamma', 'lnorm', 'logis', 'negbinom', 'norm', 'mvnorm2' (bivariate normal distribution), 'mvnorm3' (trivariate normal distribution),
'pois', 'rw_norm' (normal random walk), 'rw_mvnorm2' (bivariate normal random walk), 'rw_mvnorm3' (trivariate normal random walk), 'vm', 'vmConsensus', 'weibull', and 'wrpcauchy'. For example,
|
Par |
A named list containing vectors of initial state-dependent probability distribution parameters for
each data stream specified in If |
beta |
Matrix of regression parameters for the transition probabilities (more information in "Details"). |
delta |
Initial value for the initial distribution of the HMM. Default: |
formula |
Regression formula for the transition probability covariates. Default: |
formulaDelta |
Regression formula for the initial distribution. Default: |
mixtures |
Number of mixtures for the state transition probabilities (i.e. discrete random effects *sensu* DeRuiter et al. 2017). Default: |
formulaPi |
Regression formula for the mixture distribution probabilities. Default: |
covs |
Covariate values to include in the simulated data, as a dataframe. The names of any covariates specified by |
nbCovs |
Number of covariates to simulate (0 by default). Does not need to be specified if
|
spatialCovs |
List of |
zeroInflation |
A named list of logicals indicating whether the probability distributions of the data streams should be zero-inflated. If |
oneInflation |
A named list of logicals indicating whether the probability distributions of the data streams should be one-inflated. If |
circularAngleMean |
An optional named list indicating whether to use circular-linear (FALSE) or circular-circular (TRUE)
regression on the mean of circular distributions ('vm' and 'wrpcauchy') for turning angles. For example,
Alternatively, |
centers |
2-column matrix providing the x-coordinates (column 1) and y-coordinates (column 2) for any activity centers (e.g., potential
centers of attraction or repulsion) from which distance and angle covariates will be calculated based on the simulated location data. These distance and angle
covariates can be included in |
centroids |
List where each element is a data frame consisting of at least |
angleCovs |
Character vector indicating the names of any circular-circular regression angular covariates in |
obsPerAnimal |
Either the number of observations per animal (if single value) or the bounds of the number of observations per animal (if vector of two values). In the latter case,
the numbers of obervations generated for each animal are uniformously picked from this interval. Alternatively, |
initialPosition |
2-vector providing the x- and y-coordinates of the initial position for all animals. Alternatively, |
DM |
An optional named list indicating the design matrices to be used for the probability distribution parameters of each data
stream. Each element of Design matrices specified using formulas allow standard functions in R formulas
(e.g., |
userBounds |
An optional named list of 2-column matrices specifying bounds on the natural (i.e, real) scale of the probability
distribution parameters for each data stream. For example, for a 2-state model using the wrapped Cauchy ('wrpcauchy') distribution for
a data stream named 'angle' with |
workBounds |
An optional named list of 2-column matrices specifying bounds on the working scale of the probability distribution, transition probability, and initial distribution parameters. For each matrix, the first column pertains to the lower bound and the second column the upper bound.
For data streams, each element of |
betaRef |
Numeric vector of length |
mvnCoords |
Character string indicating the name of location data that are to be simulated using a multivariate normal distribution. For example, if |
stateNames |
Optional character vector of length nbStates indicating state names. |
model |
A |
states |
|
retrySims |
Number of times to attempt to simulate data within the spatial extent of |
lambda |
Observation rate for location data. If |
errorEllipse |
List providing the upper bound for the semi-major axis ( |
ncores |
Number of cores to use for parallel processing. Default: 1 (no parallel processing). |
hierStates |
A hierarchical model structure |
hierDist |
A hierarchical data structure |
hierBeta |
A hierarchical data structure |
hierDelta |
A hierarchical data structure |
hierFormula |
A hierarchical formula structure for the transition probability covariates for each level of the hierarchy ('formula'). Default: |
hierFormulaDelta |
A hierarchical formula structure for the initial distribution covariates for each level of the hierarchy ('formulaDelta'). Default: |
nbHierCovs |
A hierarchical data structure |
obsPerLevel |
A hierarchical data structure |
Details
-
simHierDatais very similar tosimDataexcept that instead of simply specifying the number of states (nbStates), distributions (dist), observations (obsPerAnimal), covariates (nbCovs), and a single t.p.m. formula (formula), thehierStatesargument specifies the hierarchical nature of the states, thehierDistargument specifies the hierarchical nature of the data streams, theobsPerLevelargument specifies the number of observations for each level of the hierarchy, thenbHierCovsargument specifies the number of covariates for each level of the hierarchy, and thehierFormulaargument specifies a t.p.m. formula for each level of the hierarchy. All of the hierarhcial arguments insimHierDataare specified asNodeobjects from thedata.treepackage. x- and y-coordinate location data are generated only if valid 'step' and 'angle' data streams are specified. Vaild distributions for 'step' include 'gamma', 'weibull', 'exp', and 'lnorm'. Valid distributions for 'angle' include 'vm' and 'wrpcauchy'. If only a valid 'step' data stream is specified, then only x-coordinates are generated.
If
DMis specified for a particular data stream, then the initial values are specified on the working (i.e., beta) scale of the parameters. The working scale of each parameter is determined by the link function used. The functiongetParDMis intended to help with obtaining initial values on the working scale when specifying a design matrix and other parameter constraints.Simulated data that are temporally regular (i.e.,
lambda=NULL) and without location measurement error (i.e.,errorEllipse=NULL) are returned as amomentuHMMData(ormomentuHierHMMData) object suitable for analysis usingfitHMM.Simulated location data that are temporally-irregular (i.e.,
lambda>0) and/or with location measurement error (i.e.,errorEllipse!=NULL) are returned as a data frame suitable for analysis usingcrawlWrap.The matrix
betaof regression coefficients for the transition probabilities has one row for the intercept, plus one row for each covariate, and one column for each non-diagonal element of the transition probability matrix. For example, in a 3-state HMM with 2formulacovariates, the matrixbetahas three rows (intercept + two covariates) and six columns (six non-diagonal elements in the 3x3 transition probability matrix - filled in row-wise). In a covariate-free model (default),betahas one row, for the intercept.State-specific formulas can be specified in
DMusing special formula functions. These special functions can take the namespaste0("state",1:nbStates)(where the integer indicates the state-specific formula). For example,DM=list(step=list(mean=~cov1+state1(cov2),sd=~cov2+state2(cov1)))includescov1on the mean parameter for all states,cov2on the mean parameter for state 1,cov2on the sd parameter for all states, andcov1on the sd parameter for state 2.State- and parameter-specific formulas can be specified for transition probabilities in
formulausing special formula functions. These special functions can take the namespaste0("state",1:nbStates)(where the integer indicates the current state from which transitions occur),paste0("toState",1:nbStates)(where the integer indicates the state to which transitions occur), orpaste0("betaCol",nbStates*(nbStates-1))(where the integer indicates the column of thebetamatrix). For example withnbStates=3,formula=~cov1+betaCol1(cov2)+state3(cov3)+toState1(cov4)includescov1on all transition probability parameters,cov2on thebetacolumn corresponding to the transition from state 1->2,cov3on transition probabilities from state 3 (i.e.,betacolumns corresponding to state transitions 3->1 and 3->2), andcov4on transition probabilities to state 1 (i.e.,betacolumns corresponding to state transitions 2->1 and 3->1).Cyclical relationships (e.g., hourly, monthly) may be simulated using the
consinor(x,period)special formula function for covariatexand sine curve period of time lengthperiod. For example, if the data are hourly, a 24-hour cycle can be simulated using~cosinor(cov1,24), where the covariatecov1is a repeating series of integers0,1,...,23,0,1,...,23,0,1,...(note thatsimDatawill not do this for you, the appropriate covariate must be specified using thecovsargument; see example below). Thecosinor(x,period)function convertsxto 2 covariatescosinorCos(x)=cos(2*pi*x/period)andconsinorSin(x)=sin(2*pi*x/periodfor inclusion in the model (i.e., 2 additional parameters per state). The amplitude of the sine wave is thussqrt(B_cos^2 + B_sin^2), whereB_cosandB_sinare the working parameters correponding tocosinorCos(x)andcosinorSin(x), respectively (e.g., see Cornelissen 2014).When the circular-circular regression model is used, the special function
angleFormula(cov,strength,by)can be used inDMfor the mean of angular distributions (i.e. 'vm', 'vmConsensus', and 'wrpcauchy'), wherecovis an angle covariate (e.g. wind direction),strengthis a positive real covariate (e.g. wind speed), andbyis an optional factor variable for individual- or group-level effects (e.g. ID, sex). This allows angle covariates to be weighted based on their strength or importance at time step t as in Rivest et al. (2016).
If the length of covariate values passed (either through 'covs', or 'model') is not the same as the number of observations suggested by 'nbAnimals' and 'obsPerAnimal' (or 'obsPerLevel' for
simHierData), then the series of covariates is either shortened (removing last values - if too long) or extended (starting over from the first values - if too short).For
simData, when covariates are not included informulaDelta(i.e.formulaDelta=NULL), thendeltais specified as a vector of lengthnbStatesthat sums to 1. When covariates are included informulaDelta, thendeltamust be specified as a k x (nbStates-1) matrix of working parameters, where k is the number of regression coefficients and the columns correspond to states 2:nbStates. For example, in a 3-state HMM withformulaDelta=~cov1+cov2, the matrixdeltahas three rows (intercept + two covariates) and 2 columns (corresponding to states 2 and 3). The initial distribution working parameters are transformed to the real scale asexp(covsDelta*Delta)/rowSums(exp(covsDelta*Delta)), wherecovsDeltais the N x k design matrix,Delta=cbind(rep(0,k),delta)is a k xnbStatesmatrix of working parameters, andN=length(unique(data$ID)).For
simHierData,deltamust be specified as a k x (nbStates-1) matrix of working parameters, where k is the number of regression coefficients and the columns correspond to states 2:nbStates.
Value
If the simulated data are temporally regular (i.e., lambda=NULL) with no measurement error (i.e., errorEllipse=NULL), an object momentuHMMData (or momentuHierHMMData),
i.e., a dataframe of:
ID |
The ID(s) of the observed animal(s) |
... |
Data streams as specified by |
x |
Either easting or longitude (if data streams include valid non-negative distribution for 'step') |
y |
Either norting or latitude (if data streams include valid non-negative distribution for 'step') |
... |
Covariates (if any) |
If simulated location data are temporally irregular (i.e., lambda>0) and/or include measurement error (i.e., errorEllipse!=NULL), a dataframe of:
time |
Numeric time of each observed (and missing) observation |
ID |
The ID(s) of the observed animal(s) |
x |
Either easting or longitude observed location |
y |
Either norting or latitude observed location |
... |
Data streams that are not derived from location (if applicable) |
... |
Covariates at temporally-regular true ( |
mux |
Either easting or longitude true location |
muy |
Either norting or latitude true location |
error_semimajor_axis |
error ellipse semi-major axis (if applicable) |
error_semiminor_axis |
error ellipse semi-minor axis (if applicable) |
error_ellipse_orientation |
error ellipse orientation (if applicable) |
ln.sd.x |
log of the square root of the x-variance of bivariate normal error (if applicable; required for error ellipse models in |
ln.sd.y |
log of the square root of the y-variance of bivariate normal error (if applicable; required for error ellipse models in |
error.corr |
correlation term of bivariate normal error (if applicable; required for error ellipse models in |
References
Cornelissen, G. 2014. Cosinor-based rhythmometry. Theoretical Biology and Medical Modelling 11:16.
McClintock BT, London JM, Cameron MF, Boveng PL. 2015. Modelling animal movement using the Argos satellite telemetry location error ellipse. Methods in Ecology and Evolution 6(3):266-277.
Rivest, LP, Duchesne, T, Nicosia, A, Fortin, D, 2016. A general angular regression model for the analysis of data on animal movement in ecology. Journal of the Royal Statistical Society: Series C (Applied Statistics), 65(3):445-463.
Leos-Barajas, V., Gangloff, E.J., Adam, T., Langrock, R., van Beest, F.M., Nabe-Nielsen, J. and Morales, J.M. 2017. Multi-scale modeling of animal movement and general behavior data using hidden Markov models with hierarchical structures. Journal of Agricultural, Biological and Environmental Statistics, 22 (3), 232-248.
See Also
Examples
# 1. Pass a fitted model to simulate from
# (m is a momentuHMM object - as returned by fitHMM - automatically loaded with the package)
# We keep the default nbAnimals=1.
m <- example$m
obsPerAnimal=c(50,100)
data <- simData(model=m,obsPerAnimal=obsPerAnimal)
## Not run:
# 2. Pass the parameters of the model to simulate from
stepPar <- c(1,10,1,5,0.2,0.3) # mean_1, mean_2, sd_1, sd_2, zeromass_1, zeromass_2
anglePar <- c(pi,0,0.5,2) # mean_1, mean_2, concentration_1, concentration_2
omegaPar <- c(1,10,10,1) # shape1_1, shape1_2, shape2_1, shape2_2
stepDist <- "gamma"
angleDist <- "vm"
omegaDist <- "beta"
data <- simData(nbAnimals=4,nbStates=2,dist=list(step=stepDist,angle=angleDist,omega=omegaDist),
Par=list(step=stepPar,angle=anglePar,omega=omegaPar),nbCovs=2,
zeroInflation=list(step=TRUE),
obsPerAnimal=obsPerAnimal)
# 3. Include covariates
# (note that it is useless to specify "nbCovs", which are overruled
# by the number of columns of "cov")
cov <- data.frame(temp=log(rnorm(500,20,5)))
stepPar <- c(log(10),0.1,log(100),-0.1,log(5),log(25)) # working scale parameters for step DM
anglePar <- c(pi,0,0.5,2) # mean_1, mean_2, concentration_1, concentration_2
stepDist <- "gamma"
angleDist <- "vm"
data <- simData(nbAnimals=2,nbStates=2,dist=list(step=stepDist,angle=angleDist),
Par=list(step=stepPar,angle=anglePar),
DM=list(step=list(mean=~temp,sd=~1)),
covs=cov,
obsPerAnimal=obsPerAnimal)
# 4. Include example 'forest' spatial covariate raster layer
# nbAnimals and obsPerAnimal kept small to reduce example run time
spatialCov<-list(forest=forest)
data <- simData(nbAnimals=1,nbStates=2,dist=list(step=stepDist,angle=angleDist),
Par=list(step=c(100,1000,50,100),angle=c(0,0,0.1,5)),
beta=matrix(c(5,-10,-25,50),nrow=2,ncol=2,byrow=TRUE),
formula=~forest,spatialCovs=spatialCov,
obsPerAnimal=250,states=TRUE,
retrySims=100)
# 5. Specify design matrix for 'omega' data stream
# natural scale parameters for step and angle
stepPar <- c(1,10,1,5) # shape_1, shape_2, scale_1, scale_2
anglePar <- c(pi,0,0.5,0.7) # mean_1, mean_2, concentration_1, concentration_2
# working scale parameters for omega DM
omegaPar <- c(log(1),0.1,log(10),-0.1,log(10),-0.1,log(1),0.1)
stepDist <- "weibull"
angleDist <- "wrpcauchy"
omegaDist <- "beta"
data <- simData(nbStates=2,dist=list(step=stepDist,angle=angleDist,omega=omegaDist),
Par=list(step=stepPar,angle=anglePar,omega=omegaPar),nbCovs=2,
DM=list(omega=list(shape1=~cov1,shape2=~cov2)),
obsPerAnimal=obsPerAnimal,states=TRUE)
# 6. Include temporal irregularity and location measurement error
lambda <- 2 # expect 2 observations per time step
errorEllipse <- list(M=50,m=25,r=180)
obsData <- simData(model=m,obsPerAnimal=obsPerAnimal,
lambda=lambda, errorEllipse=errorEllipse)
# 7. Cosinor and state-dependent formulas
nbStates<-2
dist<-list(step="gamma")
Par<-list(step=c(100,1000,50,100))
# include 24-hour cycle on all transition probabilities
# include 12-hour cycle on transitions from state 2
formula=~cosinor(hour24,24)+state2(cosinor(hour12,12))
# specify appropriate covariates
covs<-data.frame(hour24=0:23,hour12=0:11)
beta<-matrix(c(-1.5,1,1,NA,NA,-1.5,-1,-1,1,1),5,2)
# row names for beta not required but can be helpful
rownames(beta)<-c("(Intercept)",
"cosinorCos(hour24, 24)",
"cosinorSin(hour24, 24)",
"cosinorCos(hour12, 12)",
"cosinorSin(hour12, 12)")
data.cos<-simData(nbStates=nbStates,dist=dist,Par=Par,
beta=beta,formula=formula,covs=covs)
# 8. Piecewise constant B-spline on step length mean and angle concentration
nObs <- 1000 # length of simulated track
cov <- data.frame(time=1:nObs) # time covariate for splines
dist <- list(step="gamma",angle="vm")
stepDM <- list(mean=~splines2::bSpline(time,df=2,degree=0),sd=~1)
angleDM <- list(mean=~1,concentration=~splines2::bSpline(time,df=2,degree=0))
DM <- list(step=stepDM,angle=angleDM)
Par <- list(step=c(log(1000),1,-1,log(100)),angle=c(0,log(10),2,-5))
data.spline<-simData(obsPerAnimal=nObs,nbStates=1,dist=dist,Par=Par,DM=DM,covs=cov)
# 9. Initial state (delta) based on covariate
nObs <- 100
dist <- list(step="gamma",angle="vm")
Par <- list(step=c(100,1000,50,100),angle=c(0,0,0.01,0.75))
# create sex covariate
cov <- data.frame(sex=factor(rep(c("F","M"),each=nObs))) # sex covariate
formulaDelta <- ~ sex + 0
# Female begins in state 1, male begins in state 2
delta <- matrix(c(-100,100),2,1,dimnames=list(c("sexF","sexM"),"state 2"))
data.delta<-simData(nbAnimals=2,obsPerAnimal=nObs,nbStates=2,dist=dist,Par=Par,
delta=delta,formulaDelta=formulaDelta,covs=cov,
beta=matrix(-1.5,1,2),states=TRUE)
## End(Not run)