weight_data {hypervolume}R Documentation

Abundance weighting and prior of data for hypervolume input

Description

Resamples input data for hypervolume construction, so that some data points can be weighted more strongly than others in kernel density estimation. Also allows a multidimensional normal prior distribution to be placed on each data point to enable simulation of uncertainty or variation within each observed data point.

Note that this algorithm will change the number of data points and may thus lead to changes in the inferred hypervolume if the selected algorithm (e.g. for bandwidth selection) depends on sample size.

A direct weighting approach (which does not artificially change the sample size, and thus the kernel bandwidth estimate) is available for Gaussian hypervolumes within hypervolume_gaussian.

Usage

weight_data(data, weights, jitter.sd = matrix(0, nrow = nrow(data), ncol = ncol(data)))

Arguments

data

A data frame or matrix of unweighted data. Must only contain numeric values.

weights

A vector of weights with the same length as the number of rows in data. All values must take positive integer values.

jitter.sd

A matrix of the same size as data corresponding to the standard deviation of a normal distribution with mean equal to that of the observed data. If a vector of length equal to 1 or the number of columns of data, is repeated for all observations.

Details

Each data point is jittered a single time. To sample many points from a distribution around each observed data point, multiply all weights by a large number.

Value

A data frame with the rows of data repeated by weights, potentially with noise added. The output has the same columns as the input but sum(weights) total rows.

See Also

hypervolume_gaussian

Examples

data(penguins,package='palmerpenguins')
penguins_no_na = as.data.frame(na.omit(penguins))
penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie",
                    c("bill_length_mm","bill_depth_mm","flipper_length_mm")]

weighted_data <- weight_data(penguins_adelie,
  weights=1+rpois(n=nrow(penguins_adelie),lambda=3))
# color points by alpha to show overlaps
pairs(weighted_data,col=rgb(1,0,0,alpha=0.15)) 

weighted_noisy_data <- weight_data(penguins_adelie,
  weights=1+rpois(n=nrow(penguins_adelie),lambda=3),jitter.sd=0.5)
# color points by alpha to show overlaps
pairs(weighted_noisy_data,col=rgb(1,0,0,alpha=0.15)) 

[Package hypervolume version 3.1.4 Index]