R: Hypervolume construction via Gaussian kernel density...

hypervolume_gaussian {hypervolume}

R Documentation

Hypervolume construction via Gaussian kernel density estimation

Description

Constructs a hypervolume by building a Gaussian kernel density estimate on an adaptive grid of random points wrapping around the original data points. The bandwidth vector reflects the axis-aligned standard deviations of a hyperelliptical kernel.

Because Gaussian kernel density estimates do not decay to zero in a finite distance, the algorithm evaluates the kernel density in hyperelliptical regions out to a distance set by sd.count.

After delineating the probability density, the function calls hypervolume_threshold to determine a boundary. The defaullt behavior ensures that 95 percent of the stimated probability density is enclosed by the chosen boundary. However note that theaccuracy of the total probability density depends on having set a large value of sd.count.

Most use cases should not require modification of any parameters except kde.bandwidth.

Optionally, weighting of the data (e.g. for abundance-weighting) is possible. By default, the function estimates the probability density of the observations via Gaussian kernel functions, assuming each data point contributes equally. By setting a weight parameter, the algorithm can instead take a weighted average the kernel functions centered on each observation. Code for weighting data written by Yuanzhi Li (Yuanzhi.Li@usherbrooke.ca).

Usage

hypervolume_gaussian(data, name = NULL, 
                      weight = NULL,
                      samples.per.point = ceiling((10^(3 + sqrt(ncol(data))))/nrow(data)),
                      kde.bandwidth = estimate_bandwidth(data), 
                      sd.count = 3, 
                      quantile.requested = 0.95, 
                      quantile.requested.type = "probability", 
                      chunk.size = 1000, 
                      verbose = TRUE, 
                      ...)

Arguments

`data`	A m x n matrix or data frame, where m is the number of observations and n is the dimensionality.
`name`	A string to assign to the hypervolume for later output and plotting. Defaults to the name of the variable if NULL.
`weight`	An optional vector of weights for the kernel density estimation. Defaults to even weighting (`rep(1/nrow(data),nrow(data))`) if `NULL`.
`samples.per.point`	Number of random points to be evaluated per data point in `data`.
`kde.bandwidth`	A bandwidth vector obtained by running `estimate_bandwidth` Note that previous package version (<3.0.0) allowed inputting a scalar/vector value here - this is now handled through the `estimate_bandwidth` interface.
`sd.count`	The number of standard deviations (converted to actual units by multiplying by `kde.bandwidth`) at which the 'edge' of the hypervolume should be evaluated. Larger values of `threshold.sd.count` will come closer to a true estimate of the Gaussian density over a larger region of hyperspace, but require rapidly increasing computational resources (see Details section). It is generally better to use a large/default value for this parameter. Warnings will be generated if chosen to take a value less than 3.
`quantile.requested`	The quantile value used to delineate the boundary of the kernel density estimate. See `hypervolume_threshold`.
`quantile.requested.type`	The type of quantile (volume or probability) used for the boundary delineation. See `hypervolume_threshold`.
`chunk.size`	Number of random points to process per internal step. Larger values may have better performance on machines with large amounts of free memory. Changing this parameter does not change the output of the function; only how this output is internally assembled.
`verbose`	Logical value; print diagnostic output if `TRUE`.
`...`	Other arguments to pass to `hypervolume_threshold`

Value

A Hypervolume-class object corresponding to the inferred hypervolume.

Examples

data(penguins,package='palmerpenguins')
penguins_no_na = as.data.frame(na.omit(penguins))
penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie",
                    c("bill_length_mm","bill_depth_mm","flipper_length_mm")]
  
# low samples per point for CRAN demo
hv = hypervolume_gaussian(penguins_adelie,name='Adelie',samples.per.point=100)
summary(hv)