crm {marked} | R Documentation |
Capture-recapture model fitting function
Description
Fits user specified models to some types of capture-recapture wholly in R and not with MARK. A single function that processes data, creates the design data, makes the crm model and runs it
Usage
crm(
data,
ddl = NULL,
begin.time = 1,
model = "CJS",
title = "",
model.parameters = list(),
design.parameters = list(),
initial = NULL,
groups = NULL,
time.intervals = NULL,
debug = FALSE,
method = NULL,
hessian = FALSE,
accumulate = TRUE,
chunk_size = 1e+07,
control = list(),
refit = 1,
itnmax = 5000,
scale = NULL,
run = TRUE,
burnin = 100,
iter = 1000,
use.admb = FALSE,
use.tmb = FALSE,
crossed = NULL,
reml = FALSE,
compile = FALSE,
extra.args = NULL,
strata.labels = NULL,
clean = NULL,
save.matrices = FALSE,
simplify = FALSE,
getreals = FALSE,
real.ids = NULL,
check = FALSE,
prior = FALSE,
prior.list = NULL,
useHess = FALSE,
optimize = TRUE,
unit_scale = TRUE,
...
)
Arguments
data |
Either the raw data which is a dataframe with at least one column named ch (a character field containing the capture history) or a processed dataframe |
ddl |
Design data list which contains a list element for each parameter type; if NULL it is created |
begin.time |
Time of first capture(release) occasion |
model |
Type of c-r model (eg, "cjs", "js") |
title |
Optional title; not used at present |
model.parameters |
List of model parameter specifications |
design.parameters |
Specification of any grouping variables for design data for each parameter |
initial |
Optional list of named vectors of initial values for beta parameters or a previously run model |
groups |
Vector of names factor variables for creating groups |
time.intervals |
Intervals of time between the capture occasions |
debug |
if TRUE, shows optimization output |
method |
optimization method |
hessian |
if TRUE, computes v-c matrix using hessian |
accumulate |
if TRUE, like capture-histories are accumulated to reduce computation |
chunk_size |
specifies amount of memory to use in accumulating capture histories and design matrices; amount used is 8*chunk_size/1e6 MB (default 80MB) |
control |
control string for optimization functions |
refit |
non-zero entry to refit |
itnmax |
maximum number of iterations for optimization |
scale |
vector of scale values for parameters |
run |
if TRUE, it runs model; otherwise if FALSE can be used to test model build components |
burnin |
number of iterations for mcmc burnin; specified default not realistic for actual use |
iter |
number of iterations after burnin for mcmc (not realistic default) |
use.admb |
if TRUE uses ADMB for cjs, mscjs or mvms models |
use.tmb |
if TRUE runs TMB for cjs, mscjs or mvms models |
crossed |
if TRUE it uses cjs.tpl or cjs_reml.tpl if reml=FALSE or TRUE respectively; if FALSE, then it uses cjsre which can use Gauss-Hermite integration |
reml |
if TRUE uses restricted maximum likelihood |
compile |
if TRUE forces re-compilation of tpl file |
extra.args |
optional character string that is passed to admb if use.admb==TRUE |
strata.labels |
labels for strata used in capture history; they are converted to numeric in the order listed. Only needed to specify unobserved strata. For any unobserved strata p=0.. |
clean |
if TRUE, deletes the tpl and executable files for amdb if use.admb=T |
save.matrices |
for HMM models this option controls whether the gamma,dmat and delta matrices are saved in the model object |
simplify |
if TRUE, design matrix is simplified to unique valus including fixed values |
getreals |
if TRUE, compute real values and std errors for TMB models; may want to set as FALSE until model selection is complete |
real.ids |
vector of id values for which real parameters should be output with std error information for TMB models; if NULL all ids used |
check |
if TRUE values of gamma, dmat and delta are checked to make sure the values are valid with initial parameter values. |
prior |
if TRUE will expect vectors of prior values in list prior.list; currently only implemented for cjsre_tmb |
prior.list |
which contains list of prior parameters that will be model dependent |
useHess |
if TRUE, the TMB hessian function is used for optimization; using hessian is typically slower with many parameters but can result in a better solution |
optimize |
if TRUE, optimizes to get parameter estimates; set to FALSE to extract estimates of ADREPORTed values only |
unit_scale |
default TRUE, if FALSE any time scaled parameter (e.g. Phi,S) is scaled when computing real value such that it represents the length of the interval rather than a unit interval |
... |
optional arguments passed to js or cjs and optimx |
Details
This function is operationally similar to the function mark in RMark
in that is is a shell that calls several other functions to perform the
following steps: 1) process.data
to setup data and parameters
and package them into a list (processed data),2)
make.design.data
to create the design data for each parameter
in the specified model, 3) create.dm
to create the design
matrices for each parameter based on the formula provided for each
parameter, 4) call to the specific function for model fitting. As with mark
the calling
arguments for crm
are a compilation of the calling arguments for each
of the functions it calls (with some arguments renamed to avoid conflicts).
expects to find a value for ddl
. Likewise, if the data have not been
processed, then ddl
should be NULL. This dual calling structure
allows either a single call approach for each model or alternatively the preferred method
where the data area processed and the design data (ddl
) created once and
then a whole series of models can be analyzed without repeating those steps.
There are some optional arguments that can be used to set initial values and
control other aspects of the optimization. The optimization is done with
the R package/function optimx
and the arguments method
and
hessian
are described with the help for that function. In addition,
any arguments not matching those in the fitting functions (eg cjs_admb
) are passed to
optimx
allowing any of the other parameters to be set. If you set
debug=TRUE
, then at each function evaluation (cjs.lnl
the current values of the parameters and -2*log-likelihood value are output.
In the current implementation, a logit link is used to constrain the
parameters in the unit interval (0,1) except for probability of entry which
uses an mlogit and N which uses a log link. For the probitCJS model, a probit link is
used for the parameters. These could be generalized to
use other link functions. Following the notation of MARK, the parameters in
the link space are referred to as beta
and those in the actual
parameter space of Phi
and p
as reals.
Initial values can be set in 2 ways. 1) Define a list of named vectors with
the initial beta parameter values (eg logit link) in initial
.
The names of the vectors should be the parameter names in the model. Any unspecified
values are set to 0. 2) Specify a previously run model for initial. The code will match the names of the
current design matrix to the names in beta
and use the appropriate
initial values. Any non-specified values are set to 0. If no value is specified for initial,
all beta are started at a value of 0, except for the CJS model which attempts to use a glm approach to
setting starting values. If the glm fails then they are set to 0.
If you have a study with unequal time intervals between capture occasions,
then these can be specified with the argument time.intervals
.
The argument accumulate
defaults to TRUE
. When it is
TRUE
it will accumulate common capture histories that also have
common design and common fixed values (see below) for the parameters. This
will speed up the analysis because in the calculation of the likelihood
(cjs.lnl
it loops over the unique values. In general the
default will be the best unless you have many capture histories and are
using many individual covariate(s) in the formula that would make each entry
unique. In that case there will be no effect of accumulation but the code
will still try to accumulate. In that particular case by setting
accumulate=FALSE
you can skip the code run for accumulation.
Most of the arguments controlling the fitted model are contained in lists in
the arguments model.parameters
and design.parameters
which are
similar to their counterparts in mark inb RMark
. Each is a named list
with the names being the parameters in the model (e.g., Phi and p in "cjs"
and "Phi","p","pent","N" in "js"). Each named element is also a list
containing various values defining the design data and model for the
parameter. The elements of model.parameters
can include
formula
which is an R formula to create the design matrix for the
parameter and fixed
is a matrix of fixed values as described below.
The elements of design.parameters
can include time.varying
,
fields
, time.bins
,age.bins
, and cohort.bins
. See
create.dmdf
for a description of the first 2 and
create.dm
for a description of the last 3.
Real parameters can be set to fixed values using fixed=x
where x is a
matrix with 3 columns and any number of rows. The first column specifies
the particular animal (capture history) as the row number in the dataframe
x
. The second specifies the capture occasion number for the real
parameter to be fixed. For Phi
and pent
these are 1 to
nocc
-1 and for p
they are 2 to nocc
for "cjs" and 1 to
nocc
for "js". This difference is due to the parameter labeling by
the beginning of the interval for Phi (e.g., survival from occasion 1 to 2)
and by the occasion for p
. For "cjs" p
is not estimated for
occasion 1. The third element in the row is the real value in the closed
unit interval [0,1] for the fixed parameter. This approach is completely
general allowing you to fix a particular real parameter for a specific
animal and occasion but it is a bit kludgy. Alternatively, you can set fixed
values by specifying values for a field called fix in the design data for a parameter.
If the value of fix is NA the parameter is estimated and if it is not NA then the real
parameter is fixed at that value. If you also specify fixed as decribed above, they will over-ride any
values you have also set with fix in the design data. To set all of the real values for a
particular occasion you can use the following example with the dipper data
as a template:
model.parameters=list(Phi=list(formula=~1,
fixed=cbind(1:nrow(dipper),rep(2,nrow(dipper)),rep(1,nrow(dipper)))))
The above sets Phi
to 1 for the interval between occasions 2 and 3
for all animals.
Alternatively, you could do as follows:
data(dipper) dp=process.data(dipper) ddl=make.design.data(dp) ddl$Phi$fix=ifelse(ddl$Phi$time==2,1,NA)
At present there is no modification of the parameter count to address fixing of real parameters except that if by fixing reals, a beta is not needed in the design it will be dropped. For example, if you were to use ~time for Phi with survival fixed to 1 for time 2, then then beta for that time would not be included.
To use ADMB (use.admb=TRUE), you need to install: 1) the R package R2admb, 2) ADMB, and 3) a C++ compiler (I recommend gcc compiler). The following are instructions for installation with Windows. For other operating systems see (http://www.admb-project.org/) and (https://www.admb-project.org/tools/gcc/).
Windows Instructions:
1) In R use install.packages function or choose Packages/Install Packages from menu and select R2admb.
2) Install ADMB 11: https://www.admb-project.org/downloads/. Put the software in C:/admb to avoid problems with spaces in directory name and for the function below to work.
3) Install gcc compiler from: https://www.admb-project.org/tools/gcc/. Put in c:/MinGW
I use the following function in R to setup R2admb to access ADMB rather than adding to my path so gcc versions with Rtools don't conflict.
prepare_admb=function() { Sys.setenv(PATH = paste("c:/admb/bin;c:admb/utilities;c:/MinGW/bin;", Sys.getenv("PATH"), sep = ";")) Sys.setenv(ADMB_HOME = "c:/admb") invisible() }
To use different locations you'll need to change the values used above
Before running crm with use.admb=T, execute the function prepare_admb(). You could put this function or the code it contains in your .First or .Rprofile so it runs each time you start R.
Value
crm model object with class=("crm",submodel), eg "CJS".
Author(s)
Jeff Laake
See Also
cjs_admb
, js
,
make.design.data
,process.data
Examples
{
# cormack-jolly-seber model
# fit cjs models with crm
data(dipper)
dipper.proc=process.data(dipper,model="cjs",begin.time=1)
dipper.ddl=make.design.data(dipper.proc)
mod.Phit.pt=crm(dipper.proc,dipper.ddl,
model.parameters=list(Phi=list(formula=~time),p=list(formula=~time)))
mod.Phit.pt
mod.Phisex.pdot=crm(dipper.proc,dipper.ddl,
model.parameters=list(Phi=list(formula=~sex),p=list(formula=~1)))
mod.Phisex.pdot
# demo initial value setting
mod.Phisex.ptime=crm(dipper.proc,dipper.ddl,
model.parameters=list(Phi=list(formula=~sex),p=list(formula=~time)),initial=mod.Phit.pt)
mod.Phisex.ptime
mod.Phisex.ptime=crm(dipper.proc,dipper.ddl,
model.parameters=list(Phi=list(formula=~sex),p=list(formula=~time)),initial=list(Phi=0,p=0))
mod.Phisex.ptime
## if you have RMark installed you can use this code to run the same models
## by removing the comment symbol
#library(RMark)
#data(dipper)
#mod0=mark(dipper,
#model.parameters=list(Phi=list(formula=~time),p=list(formula=~time)),output=FALSE)
#summary(mod0,brief=TRUE)
#mod1=mark(dipper,
#model.parameters=list(Phi=list(formula=~1),p=list(formula=~1)),output=FALSE)
#summary(mod1,brief=TRUE)
#mod2<-mark(dipper,groups="sex",
#model.parameters=list(Phi=list(formula=~sex),p=list(formula=~1)),output=FALSE)
#summary(mod2,brief=TRUE)
# jolly seber model
crm(dipper,model="js",groups="sex",
model.parameters=list(pent=list(formula=~sex),N=list(formula=~sex)),accumulate=FALSE)
# examples showing use of unit.scale
dipper.proc=process.data(dipper,model="cjs",begin.time=1,time.intervals=c(.1,.2,.3,.4,.5,.6))
dipper.ddl=make.design.data(dipper.proc)
mod.Phit.p=crm(dipper.proc,dipper.ddl,
model.parameters=list(Phi=list(formula=~time),p=list(formula=~1)),
hessian=TRUE,unit_scale=TRUE)
mod.Phit.p
mod.Phit.p$results$reals
dipper.proc=process.data(dipper,model="cjs",begin.time=1,time.intervals=c(.1,.2,.3,.4,.5,.6))
dipper.ddl=make.design.data(dipper.proc)
mod.Phit.p=crm(dipper.proc,dipper.ddl,
model.parameters=list(Phi=list(formula=~time),p=list(formula=~1)),
hessian=TRUE,unit_scale=FALSE)
mod.Phit.p
mod.Phit.p$results$reals
# This example is excluded from testing to reduce package check time
# if you have RMark installed you can use this code to run the same models
# by removing the comment
#data(dipper)
#data(mstrata)
#mark(dipper,model.parameters=list(p=list(formula=~time)),output=FALSE)$results$beta
#mark(mstrata,model="Multistrata",model.parameters=list(p=list(formula=~1),
# S=list(formula=~1),Psi=list(formula=~-1+stratum:tostratum)),
# output=FALSE)$results$beta
#mod=mark(dipper,model="POPAN",groups="sex",
# model.parameters=list(pent=list(formula=~sex),N=list(formula=~sex)))
#summary(mod)
#CJS example with hmm
crm(dipper,model="hmmCJS",model.parameters = list(p = list(formula = ~time)))
##MSCJS example with hmm
data(mstrata)
ms=process.data(mstrata,model="hmmMSCJS",strata.labels=c("A","B","C"))
ms.ddl=make.design.data(ms)
ms.ddl$Psi$fix=NA
ms.ddl$Psi$fix[ms.ddl$Psi$stratum==ms.ddl$Psi$tostratum]=1
crm(ms,ms.ddl,model.parameters=list(Psi=list(formula=~-1+stratum:tostratum)))
}