R: Density and abundance estimates and variances

dht {mrds}

R Documentation

Density and abundance estimates and variances

Description

Compute density and abundance estimates and variances based on Horvitz-Thompson-like estimator.

Usage

dht(
  model,
  region.table,
  sample.table,
  obs.table = NULL,
  subset = NULL,
  se = TRUE,
  options = list()
)

Arguments

`model`	ddf model object
`region.table`	`data.frame` of region records. Two columns: `Region.Label` and `Area`. If only density is required, one can set `Area=0` for all regions.
`sample.table`	`data.frame` of sample records. Three columns: `Region.Label`, `Sample.Label`, `Effort`.
`obs.table`	`data.frame` of observation records with fields: `object`, `Region.Label`, and `Sample.Label` which give links to `sample.table`, `region.table` and the data records used in `model`. Not necessary if the `data.frame` used to create the model contains `Region.Label`, `Sample.Label` columns.
`subset`	subset statement to create `obs.table`
`se`	if `TRUE` computes standard errors, coefficient of variation and confidence intervals (based on log-normal approximation). See "Uncertainty" below.
`options`	a list of options that can be set, see "`dht` options", below.

Details

Density and abundance within the sampled region is computed based on a Horvitz-Thompson-like estimator for groups and individuals (if a clustered population) and this is extrapolated to the entire survey region based on any defined regional stratification. The variance is based on replicate samples within any regional stratification. For clustered populations, E(s) and its standard error are also output.

Abundance is estimated with a Horvitz-Thompson-like estimator (Huggins 1989, 1991; Borchers et al 1998; Borchers and Burnham 2004). The abundance in the sampled region is simply 1/p_1 + 1/p_2 + ... + 1/p_n where p_i is the estimated detection probability for the ith detection of n total observations. It is not strictly a Horvitz-Thompson estimator because the p_i are estimated and not known. For animals observed in tight clusters, that estimator gives the abundance of groups (group=TRUE in options) and the abundance of individuals is estimated as s_1/p_1 + s_2/p_2 + ... + s_n/p_n, where s_i is the size (e.g., number of animals in the group) of each observation (group=FALSE in options).

Extrapolation and estimation of abundance to the entire survey region is based on either a random sampling design or a stratified random sampling design. Replicate samples (lines) are specified within regional strata region.table, if any. If there is no stratification, region.table should contain only a single record with the Area for the entire survey region. The sample.table is linked to the region.table with the Region.Label. The obs.table is linked to the sample.table with the Sample.Label and Region.Label. Abundance can be restricted to a subset (e.g., for a particular species) of the population by limiting the list the observations in obs.table to those in the desired subset. Alternatively, if Sample.Label and Region.Label are in the data.frame used to fit the model, then a subset argument can be given in place of the obs.table. To use the subset argument but include all of the observations, use subset=1==1 to avoid creating an obs.table.

In extrapolating to the entire survey region it is important that the unit measurements be consistent or converted for consistency. A conversion factor can be specified with the convert.units variable in the options list. The values of Area in region.table, must be made consistent with the units for Effort in sample.table and the units of distance in the data.frame that was analyzed. It is easiest to do if the units of Area is the square of the units of Effort and then it is only necessary to convert the units of distance to the units of Effort. For example, if Effort was entered in kilometres and Area in square kilometres and distance in metres then using options=list(convert.units=0.001) would convert metres to kilometres, density would be expressed in square kilometres which would then be consistent with units for Area. However, they can all be in different units as long as the appropriate composite value for convert.units is chosen. Abundance for a survey region can be expressed as: A*N/a where A is Area for the survey region, N is the abundance in the covered (sampled) region, and a is the area of the sampled region and is in units of Effort * distance. The sampled region a is multiplied by convert.units, so it should be chosen such that the result is in the same units of Area. For example, if Effort was entered in kilometres, Area in hectares (100m x 100m) and distance in metres, then using options=list(convert.units=10) will convert a to units of hectares (100 to convert metres to 100 metres for distance and .1 to convert km to 100m units).

The argument options is a list of variable=value pairs that set options for the analysis. All but two of these have been described above. pdelta should not need to be changed but was included for completeness. It controls the precision of the first derivative calculation for the delta method variance. If the option areas.supplied is TRUE then the covered area is assumed to be supplied in the CoveredArea column of the sample data.frame.

Value

list object of class dht with elements:

`clusters`	result list for object clusters
`individuals`	result list for individuals
`Expected.S`	`data.frame` of estimates of expected cluster size with fields `Region`, `Expected.S` and `se.Expected.S` If each cluster `size=1`, then the result only includes individuals and not clusters and `Expected.S`.

The list structure of clusters and individuals are the same:

`bysample`	`data.frame` giving results for each sample; `Nchat` is the estimated abundance within the sample and `Nhat` is scaled by surveyed area/covered area within that region
`summary`	`data.frame` of summary statistics for each region and total
`N`	`data.frame` of estimates of abundance for each region and total
`D`	`data.frame` of estimates of density for each region and total
`average.p`	average detection probability estimate
`cormat`	correlation matrix of regional abundance/density estimates and total (if more than one region)
`vc`	list of 3: total variance-covariance matrix, detection function component of variance and encounter rate component of variance. For detection the v-c matrix and partial vector are returned
`Nhat.by.sample`	another summary of `Nhat` by sample used by `dht.se`

Uncertainty

If the argument se=TRUE, standard errors for density and abundance is computed. Coefficient of variation and log-normal confidence intervals are constructed using a Satterthwaite approximation for degrees of freedom (Buckland et al. 2001 p. 90). The function dht.se computes the variance and interval estimates.

The variance has two components:

variation due to uncertainty from estimation of the detection function parameters;
variation in abundance due to random sample selection;

The first component (model parameter uncertainty) is computed using a delta method estimate of variance (Huggins 1989, 1991, Borchers et al. 1998) in which the first derivatives of the abundance estimator with respect to the parameters in the detection function are computed numerically (see DeltaMethod).

The second component (encounter rate variance) can be computed in one of several ways depending on the form taken for the encounter rate and the estimator used. To begin with there three possible values for varflag to calculate encounter rate:

0 uses a binomial variance for the number of observations (equation 13 of Borchers et al. 1998). This estimator is only useful if the sampled region is the survey region and the objects are not clustered; this situation will not occur very often;
1 uses the encounter rate n/L (objects observed per unit transect) from Buckland et al. (2001) pg 78-79 (equation 3.78) for line transects (see also Fewster et al, 2009 estimator R2). This variance estimator is not appropriate if size or a derivative of size is used in the detection function;
2 is the default and uses the encounter rate estimator \hat{N}/L (estimated abundance per unit transect) suggested by Innes et al (2002) and Marques & Buckland (2004).

In general if any covariates are used in the models, the default varflag=2 is preferable as the estimated abundance will take into account variability due to covariate effects. If the population is clustered the mean group size and standard error is also reported.

For options 1 and 2, it is then possible to choose one of the estimator forms given in Fewster et al (2009) for line transects: "R2", "R3", "R4", "S1", "S2", "O1", "O2" or "O3" by specifying the ervar= option (default "R2"). For points, either the "P2" or "P3" estimator can be selected (>=mrds 2.3.0 default "P2", <= mrds 2.2.9 default "P3"). See varn and Fewster et al (2009) for further details on these estimators.

`dht` options

Several options are available to control calculations and output:

ci.width: Confidence interval width, expressed as a decimal between 0 and 1 (default 0.95, giving a 95% CI)
pdelta: delta value for computing numerical first derivatives (Default: 0.001)
varflag: 0,1,2 (see "Uncertainty") (Default: 2)
convert.units: multiplier for width to convert to units of length (Default: 1)
ervar: encounter rate variance type (see "Uncertainty" and type argument of varn). (Default: "R2" for lines and "P2" for points)

Author(s)

Jeff Laake, David L Miller

References

Borchers, D.L., S.T. Buckland, P.W. Goedhart, E.D. Clarke, and S.L. Hedley. 1998. Horvitz-Thompson estimators for double-platform line transect surveys. Biometrics 54: 1221-1237.

Borchers, D.L. and K.P. Burnham. General formulation for distance sampling pp 10-11 In: Advanced Distance Sampling, eds. S.T. Buckland, D.R.Anderson, K.P. Burnham, J.L. Laake, D.L. Borchers, and L. Thomas. Oxford University Press.

Buckland, S.T., D.R.Anderson, K.P. Burnham, J.L. Laake, D.L. Borchers, and L. Thomas. 2001. Introduction to Distance Sampling: Estimating Abundance of Biological Populations. Oxford University Press.

Fewster, R.M., S.T. Buckland, K.P. Burnham, D.L. Borchers, P.E. Jupp, J.L. Laake and L. Thomas. 2009. Estimating the encounter rate variance in distance sampling. Biometrics 65: 225-236.

Huggins, R.M. 1989. On the statistical analysis of capture experiments. Biometrika 76:133-140.

Huggins, R.M. 1991. Some practical aspects of a conditional likelihood approach to capture experiments. Biometrics 47: 725-732.

Innes, S., M.P. Heide-Jorgensen, J.L. Laake, K.L. Laidre, H.J. Cleator, P. Richard, and R.E.A. Stewart. 2002. Surveys of belugas and narwhals in the Canadian High Arctic in 1996. NAMMCO Scientific Publications 4: 169-190.

Marques, F.F.C. and S.T. Buckland. 2004. Covariate models for the detection function. In: Advanced Distance Sampling, eds. S.T. Buckland, D.R.Anderson, K.P. Burnham, J.L. Laake, D.L. Borchers, and L. Thomas. Oxford University Press.