R: Estimate a non-parametric smooth detection function from...

dfuncSmu {Rdistance}

R Documentation

Estimate a non-parametric smooth detection function from distance-sampling data

Description

Estimates a smooth detection function for line-transect perpendicular distances or point-transect radial distances.

Usage

dfuncSmu(
  formula,
  detectionData,
  siteData,
  bw = "SJ-dpi",
  adjust = 1,
  kernel = "gaussian",
  pointSurvey = FALSE,
  w.lo = units::set_units(0, "m"),
  w.hi = NULL,
  x.scl = "max",
  g.x.scl = 1,
  observer = "both",
  warn = TRUE,
  transectID = NULL,
  pointID = "point",
  outputUnits = NULL,
  length = "length",
  control = RdistanceControls()
)

Arguments

`formula`	A formula object (e.g., dist ~ 1). The left-hand side (before ~) is the name of the vector containing distances (perpendicular or radial). The right-hand side (after ~) must be the intercept-only model as `Rdistance` does not currently allow covariates in smoothed distance functions. If names in `formula` do not appear in `detectionData`, the normal scoping rules for model fitting routines (e.g., `lm` and `glm`) apply.
`detectionData`	A data frame containing detection distances (either perpendicular for line-transect or radial for point-transect designs), with one row per detected object or group. This data frame must contain at least the following information: Detection Distances: A single column containing detection distances must be specified on the left-hand side of `formula`. Site IDs: The ID of the transect or point (i.e., the 'site') where each object or group was detected. The site ID column(s) (see argument `siteID`) must specify the site (transect or point) so that this data frame can be merged with `siteData`. Optionally, this data frame can contain the following variables: Group Sizes: The number of individuals in the group associated with each detection. If unspecified, `Rdistance` assumes all detections are of single individuals (i.e., all group sizes are 1). When `Rdistance` allows detection-level covariates in some version after 2.1.1, detection-level covariates will appear in this data frame. See example data set `sparrowDetectionData`). See also Input data frames below for information on when `detectionData` and `siteData` are required inputs.
`siteData`	A data.frame containing site (transect or point) IDs and any site level covariates to include in the detection function. Every unique surveyed site (transect or point) is represented on one row of this data set, whether or not targets were sighted at the site. See arguments `transectID` and `pointID` for an explanation of site and transect ID's. If sites are transects, this data frame must also contain transect length. By default, transect length is assumed to be in column 'length' but can be specified using argument `length`. The total number of sites surveyed is `nrow(siteData)`. Duplicate site-level IDs are not allowed in `siteData`. See Input data frames for when `detectionData` and `siteData` are required inputs.
`bw`	Bandwidth of the smooth, which controls smoothness. Smoothing is done by `stats::density`, and `bw` is passed straight to it's `bw` argument. `bw` can be numeric, in which case it is the standard deviation of the Gaussian smoothing kernel. Or, `bw` can be a character string specifying the bandwidth selection rule. Valid character string values of `bw` are the following: "nrd0" : Silverman's 'rule-of-thumb' equal to `\frac{0.9s}{1.34n^{-0.2}}`, where `s` is the minimum of standard deviation of the distances and the interquartile range. See `bw.nrd0`. "nrd" : The more common 'rule-of-thumb' variation given by Scott (1992). This rule uses 1.06 in the denominator of the "nrd0" bandwidth. See `bw.nrd` "bcv" : The biased cross-validation method. See `bcv`. "ucv" : The unbiased cross-validation method. See `ucv`. "SJ" or "SJ-ste" : The 'solve-the-equation' bandwidth of Sheather & Jones (1991). See `bw.SJ` or `width.SJ`. "SJ-dpi" (default) : The 'direct-plug-in' bandwidth of Sheather & Jones (1991). See `bw.SJ` or `width.SJ`.
`adjust`	Bandwidth adjustment for the amount of smooth. Smoothing is done by `density`, and this parameter is passed straight to it's `adjust` argument. In `stats::density`, the bandwidth used is actually `adjust*bw`, and inclusion of this parameters makes it easier to specify values like 'half the default' bandwidth.
`kernel`	Character string specifying the smoothing kernel function. This parameters is passed unmodified to `stats::density`. Valid values are: "gaussian" : Gaussian (normal) kernel, the default "rectangular" : Uniform or flat kernel "triangular" : Equilateral triangular kernel "epanechnikov" : the Epanechnikov kernel "biweight" : the biweight kernel "cosine" : the S version of the cosine kernel "optcosine" : the optimal cosine kernel which is the usual one reported in the literature Values of `kernel` may be abbreviated to the first letter of each string. The numeric value of `bw` used in the smooth is stored in the `$fit` component of the returned object (i.e., in `returned$fit$bw`).
`pointSurvey`	A logical scalar specifying whether input data come from point-transect surveys (TRUE), or line-transect surveys (FALSE). Point surveys (TRUE) have not been implemented yet.
`w.lo`	Lower or left-truncation limit of the distances in distance data. This is the minimum possible off-transect distance. Default is 0.
`w.hi`	Upper or right-truncation limit of the distances in `dist`. This is the maximum off-transect distance that could be observed. If left unspecified (i.e., at the default of NULL), right-truncation is set to the maximum of the observed distances.
`x.scl`	This parameter is passed to `F.gx.estim`. See `F.gx.estim` documentation for definition.
`g.x.scl`	This parameter is passed to `F.gx.estim`. See `F.gx.estim` documentation for definition.
`observer`	This parameter is passed to `F.gx.estim`. See `F.gx.estim` documentation for definition.
`warn`	A logical scalar specifying whether to issue an R warning if the estimation did not converge or if one or more parameter estimates are at their boundaries. For estimation, `warn` should generally be left at its default value of `TRUE`. When computing bootstrap confidence intervals, setting `warn = FALSE` turns off annoying warnings when an iteration does not converge. Regardless of `warn`, messages about convergence and boundary conditions are printed by `print.dfunc`, `print.abund`, and `plot.dfunc`, so there should be little harm in setting `warn = FALSE`.
`transectID`	A character vector naming the transect ID column(s) in `detectionData` and `siteData`. Transects can be the basic sampling unit (when `pointSurvey`=FALSE) or contain multiple sampling units (e.g., when `pointSurvey`=TRUE). For line-transects, the `transectID` column(s) alone is sufficient to specify unique sample sites. For point-transects, the amalgamation of `transectID` and `pointID` specify unique sampling sites. See Input data frames.
`pointID`	When point-transects are used, this is the ID of points on a transect. When `pointSurvey`=TRUE, the amalgamation of `transectID` and `pointID` specify unique sampling sites. See Input data frames. If single points are surveyed, meaning surveyed points were not grouped into transects, each 'transect' consists of one point. In this case, set `transectID` equal to the point's ID and set `pointID` equal to 1 for all points.
`outputUnits`	A string giving the symbolic measurment units that results should be reported in. Any distance measurement unit in `units::valid_udunits()` will work. The strings for common distance symbolic units are: "m" for meters, "ft" for feet, "cm" for centimeters, "mm" for millimeters, "mi" for miles, "nmile" for nautical miles ("nm" is nano meters), "in" for inches, "yd" for yards, "km" for kilometers, "fathom" for fathoms, "chains" for chains, and "furlong" for furlongs. If `outputUnits` is unspecified (NULL), output units are the same as distance measurements units in `data`.
`length`	Character string specifying the (single) column in `siteData` that contains transect length. This is ignored if `pointSurvey` = TRUE.
`control`	A list containing optimization control parameters such as the maximum number of iterations, tolerance, the optimizer to use, etc. See the `RdistanceControls` function for explanation of each value, the defaults, and the requirements for this list. See examples below for how to change controls.

Details

Distances are reflected about w.lo before being passed to density. Distances exactly equal to w.lo are not reflected. Reflection around w.lo greatly improves performance of the kernel methods near the w.lo boundary where substantial non-zero probability of sighting typically exists.

Value

An object of class 'dfunc'. Objects of class 'dfunc' are lists containing the following components:

`parameters`	A data frame containing the $x and $y components of the smooth. $x is a vector of length 512 (default for `density`) evenly spaced points between `w.lo` and `w.hi`.
`loglik`	The value of the log likelihood. Specifically, the sum of the negative log heights of the smooth at observed distances, after the smoothed function has been scaled to integrate to one.
`w.lo`	Left-truncation value used during the fit.
`w.hi`	Right-truncation value used during the fit.
`dist`	The input vector of observed distances.
`covars`	NULL. Covariates are not allowed in the smoothed distance function (yet).
`call`	The original call of this function.
`call.x.scl`	The distance at which the distance function is scaled. This is the x at which g(x) = `g.x.scl`. Normally, `call.x.scl` = 0.
`call.g.x.scl`	The value of the distance function at distance `call.x.scl`. Normally, `call.g.x.scl` = 1.
`call.observer`	The value of input parameter `observer`.
`fit`	The smoothed object returned by `stats::density`. All information returned by `stats::density` is preserved, and in particular the numeric value of the bandwidth used during the smooth is returned in `fit$bw`
`pointSurvey`	The input value of `pointSurvey`. This is TRUE if distances are radial from a point. FALSE if distances are perpendicular off-transect.
`formula`	The formula specified for the detection function.

Input data frames

To save space and to easily specify sites without detections, all site ID's, regardless whether a detection occurred there, and site level covariates are stored in the siteData data frame. Detection distances and group sizes are measured at the detection level and are stored in the detectionData data frame.

Data frame requirements

The following explains conditions under which various combinations of the input data frames are required.

Detection data and site data both required:
Both detectionData and siteData are required if site level covariates are specified on the right-hand side of formula. Detection level covariates are not currently allowed.
Detection data only required:
The detectionData data frame alone can be specified if no covariates are included in the distance function (i.e., right-hand side of formula is "~1"). Note that this routine (dfuncEstim) does not need to know about sites where zero targets were detected, hence siteData can be missing when no covariates are involved.
Neither detection data nor site data required
Neither detectionData nor siteData are required if all variables specified in formula are within the scope of this routine (e.g., in the global working environment). Scoping rules here work the same as for other modeling routines in R such as lm and glm. Like other modeling routines, it is possible to mix and match the location of variables in the model. Some variables can be in the .GlobalEnv while others are in either detectionData or siteData.

Relationship between data frames (transect and point ID's)

The input data frames, detectionData and siteData, must be merge-able on unique sites. For line-transects, site ID's (i.e., transect ID's) are unique values of the transectID column in siteData. In this case, the following merge must work: merge(detectionData,siteData,by=transectID). For point-transects, site ID's (i.e., point ID's) are unique values of the combination paste(transectID,pointID). In this case, the following merge must work: merge(detectionData,siteData,by=c(transectID, pointID).

By default,transectID and pointID are NULL and the merge is done on all common columns. That is, when transectID is NULL, this routine assumes unique transects are specified by unique combinations of the common variables (i.e., unique values of intersect(names(detectionData), names(siteData))).

An error occurs if there are no common column names between detectionData and siteData. Duplicate site IDs are not allowed in siteData. If the same site is surveyed in multiple years, specify another transect ID column (e.g., transectID = c("year","transectID")). Duplicate site ID's are allowed in detectionData.

To help explain the relationship between data frames, bear in mind that during bootstrap estimation of variance in abundEstim, unique transects (i.e., unique values of the transect ID column(s)), not detections or points, are resampled with replacement.

References

Buckland, S.T., D.R. Anderson, K.P. Burnham, J.L. Laake, D.L. Borchers, and L. Thomas. (2001) Introduction to distance sampling: estimating abundance of biological populations. Oxford University Press, Oxford, UK.

Scott, D. W. (1992) Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley.

Sheather, S. J. and Jones, M. C. (1991) A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society series B, 53, 683-690.

Silverman, B. W. (1986) Density Estimation. London: Chapman and Hall.

Examples

# Load example sparrow data (line transect survey type)
data(sparrowDetectionData)
data(sparrowSiteData)

# Compare smoothed and half-normal detection function
dfuncSmu <- dfuncSmu(dist~1, sparrowDetectionData, w.hi=units::set_units(150, "m"))
dfuncHn  <- dfuncEstim(formula=dist~1,sparrowDetectionData,w.hi=units::set_units(150, "m"))

# Print and plot results
dfuncSmu
dfuncHn
plot(dfuncSmu,main="",nbins=50)

x <- seq(0,150,length=200)
y <- dnorm(x, 0, predict(dfuncHn)[1])
y <- y/y[1]
lines(x,y, col="orange", lwd=2)
legend("topright", legend=c("Smooth","Halfnorm"), 
  col=c("red","orange"), lwd=2)

[Package Rdistance version 3.0.0 Index]