R: Jointly optimal bandwidth selection for the spatial relative...

LSCV.risk {sparr}

R Documentation

Jointly optimal bandwidth selection for the spatial relative risk function

Description

Methods to find a jointly optimal, common case-control isotropic bandwidth for use in estimation of the fixed or adaptive kernel-smoothed relative risk function.

Usage

LSCV.risk(
  f,
  g = NULL,
  hlim = NULL,
  hseq = NULL,
  type = c("fixed", "adaptive"),
  method = c("kelsall-diggle", "hazelton", "davies"),
  resolution = 64,
  edge = TRUE,
  hp = NULL,
  pilot.symmetry = c("none", "f", "g", "pooled"),
  auto.optim = TRUE,
  seqres = 30,
  parallelise = NA,
  verbose = TRUE,
  ...
)

Arguments

`f`	Either a pre-calculated object of class `bivden` representing the ‘case’ (numerator) density estimate, or an object of class `ppp` giving the observed case data. Alternatively, if `f` is `ppp` object with dichotomous factor-valued `marks`, the function treats the first level as the case data, and the second as the control data, obviating the need to supply `g`.
`g`	As for `f`, for the ‘control’ (denominator) density; this object must be of the same class as `f`. Ignored if, as stated above, `f` contains both case and control observations.
`hlim`	An optional vector of length 2 giving the limits of the optimisation routine with respect to the bandwidth. If unspecified, the function attempts to choose this automatically.
`hseq`	An optional increasing sequence of bandwidth values at which to manually evaluate the optimisation criterion. Used only in the case `(!auto.optim && is.null(hlim))`.
`type`	A character string; `"fixed"` (default) performs classical leave-one-out cross-validation for a jointly optimal fixed bandwidth. Alternatively, `"adaptive"` utilises multiscale adaptive kernel estimation (Davies & Baddeley, 2018) to run the cross-validation in an effort to find a suitable jointly optimal, common global bandwidth for the adaptive relative risk function. See ‘Details’.
`method`	A character string controlling the selector to use. There are three types, based on either the mean integrated squared error (MISE) (Kelsall and Diggle, 1995; default – `method = "kelsall-diggle"`); a weighted MISE (Hazelton, 2008 – `method = "hazelton"`); or an approximation to the asymptotic MISE (Davies, 2013 – `method = "davies"`). See ‘Details’.
`resolution`	Spatial grid size; the optimisation will be based on a [`resolution` `\times` `resolution`] density estimate.
`edge`	Logical value indicating whether to edge-correct the density estimates used.
`hp`	A single numeric value or a vector of length 2 giving the pilot bandwidth(s) to be used for estimation of the pilot densities for adaptive risk surfaces. Ignored if `type = "fixed"`.
`pilot.symmetry`	A character string used to control the type of symmetry, if any, to use for the bandwidth factors when computing an adaptive relative risk surface. See ‘Details’. Ignored if `type = "fixed"`.
`auto.optim`	Logical value indicating whether to automate the numerical optimisation using `optimise`. If `FALSE`, the optimisation criterion is evaluated over `hseq` (if supplied), or over a seqence of values controlled by `hlim` and `seqres`.
`seqres`	Optional resolution of an increasing sequence of bandwidth values. Only used if `(!auto.optim && is.null(hseq))`.
`parallelise`	Numeric argument to invoke parallel processing, giving the number of CPU cores to use when `!auto.optim`. Experimental. Test your system first using `parallel::detectCores()` to identify the number of cores available to you.
`verbose`	Logical value indicating whether to provide function progress commentary.
`...`	Additional arguments such as `dimz` and `trim` to be passed to the internal calls to `multiscale.density`.

Details

Given the established preference of using a common bandwidth for both case and control density estimates when constructing a relative risk surface, This function calculates a ‘jointly optimal’, common isotropic LSCV bandwidth for the (Gaussian) kernel-smoothed relative risk function (case-control density-ratio). It can be shown that choosing a bandwidth that is equal for both case and control density estimates is preferable to computing ‘separately optimal’ bandwidths (Kelsall and Diggle, 1995). The user can choose to either calculate a common smoothing parameter for a fixed-bandwidth relative risk surface (type = "fixed"; default), or a common global bandwidth for an adaptive risk surface (type = "adaptive"). See further comments below.

method = "kelsall-diggle": the function computes the common bandwidth which minimises the approximate mean integrated squared error (MISE) of the log-transformed risk surface (Kelsall and Diggle, 1995).
method = "hazelton": the function minimises a weighted-by-control MISE of the (raw) relative risk function (Hazelton, 2008).
method = "davies": the optimal bandwidth is one that minimises a crude plug-in approximation to the asymptotic MISE (Davies, 2013). Only possible for type = "fixed".

For jointly optimal, common global bandwidth selection when type = "adaptive", the optimisation routine utilises multiscale.density. Like LSCV.density, the leave-one-out procedure does not affect the pilot density, for which additional control is offered via the hp and pilot.symmetry arguments. The user has the option of obtaining a so-called symmetric estimate (Davies et al. 2016) via pilot.symmetry. This amounts to choosing the same pilot density for both case and control densities. By choosing "none" (default), the result uses the case and control data separately for the fixed-bandwidth pilots, providing the original asymmetric density-ratio of Davies and Hazelton (2010). By selecting either of "f", "g", or "pooled", the pilot density is calculated based on the case, control, or pooled case/control data respectively (using hp[1] as the fixed bandwidth). Davies et al. (2016) noted some beneficial practical behaviour of the symmetric adaptive surface over the asymmetric. (The pilot bandwidth(s), if not supplied in hp, are calculated internally via default use of LSCV.density, using the requested symmetric-based data set, or separately with respect to the case and control datasets f and g if pilot.symmetry = "none".)

Value

A single numeric value of the estimated bandwidth (if auto.optim = TRUE). Otherwise, a list of two numeric vectors of equal length giving the bandwidth sequence (as hs) and corresponding CV function value (as CV).

Warning

The jointly optimal bandwidth selector can be computationally expensive for large data sets and fine evaluation grid resolutions. The user may need to experiment with adjusting hlim to find a suitable minimum.

Author(s)

T. M. Davies

References

Davies, T. M. (2013), Jointly optimal bandwidth selection for the planar kernel-smoothed density-ratio, Spatial and Spatio-temporal Epidemiology, 5, 51-65.

Davies, T.M. and Baddeley A. (2018), Fast computation of spatially adaptive kernel estimates, Statistics and Computing, 28(4), 937-956.

Davies, T.M. and Hazelton, M.L. (2010), Adaptive kernel estimation of spatial relative risk, Statistics in Medicine, 29(23) 2423-2437.

Davies, T.M., Jones, K. and Hazelton, M.L. (2016), Symmetric adaptive smoothing regimens for estimation of the spatial relative risk function, Computational Statistics & Data Analysis, 101, 12-28.

Hazelton, M. L. (2008), Letter to the editor: Kernel estimation of risk surfaces without the need for edge correction, Statistics in Medicine, 27, 2269-2272.

Kelsall, J.E. and Diggle, P.J. (1995), Kernel estimation of relative risk, Bernoulli, 1, 3-16.

Silverman, B.W. (1986), Density Estimation for Statistics and Data Analysis, Chapman & Hall, New York.

Wand, M.P. and Jones, C.M., 1995. Kernel Smoothing, Chapman & Hall, London.

Examples




data(pbc)
pbccas <- split(pbc)$case
pbccon <- split(pbc)$control

# FIXED (for common h)

LSCV.risk(pbccas,pbccon)
LSCV.risk(pbccas,pbccon,method="hazelton")
hcv <- LSCV.risk(pbccas,pbccon,method="davies",auto.optim=FALSE)
plot(hcv[,1],log(hcv[,2]));abline(v=hcv[which.min(hcv[,2]),1],col=2,lty=2)


# ADAPTIVE (for common h0)

LSCV.risk(pbccas,pbccon,type="adaptive")

# change pilot bandwidths used
LSCV.risk(pbccas,pbccon,type="adaptive",hp=c(OS(pbccas)/2,OS(pbccon)/2))

# specify pooled-data symmetric relative risk estimator 
LSCV.risk(pbccas,pbccon,type="adaptive",hp=OS(pbc),pilot.symmetry="pooled")

# as above, for Hazelton selector
LSCV.risk(pbccas,pbccon,type="adaptive",method="hazelton")
LSCV.risk(pbccas,pbccon,type="adaptive",method="hazelton",hp=c(OS(pbccas)/2,OS(pbccon)/2))
LSCV.risk(pbccas,pbccon,type="adaptive",method="hazelton",hp=OS(pbc),pilot.symmetry="pooled")

[Package sparr version 2.3-15 Index]