R: Nonparametric Density Estimation

estimatePDF {PDFEstimator}

R Documentation

Nonparametric Density Estimation

Description

Estimates the probability density function for a data sample.

Usage

estimatePDF(sample, pdfLength = NULL, estimationPoints = NULL, 
lowerBound = NULL, upperBound = NULL, target = 70, lagrangeMin = 1, 
lagrangeMax = 200, debug = 0, outlierCutoff = 7, smooth = TRUE)

Arguments

`sample`	the data sample from which to calculate the density estimate. If the sample has more than 1 column, the multivariate estimation function, estimatePDFmv(), is called instead.
`pdfLength`	the desired length of the estimate returned. Default value is calculated based on sample length. Overriding this calculation can increase or decrease the resolution of the estimate.
`estimationPoints`	a vector containing the points to estimate. If not specified, this is calculated automatically to span the entire sample data.
`lowerBound`	the lower bound of the PDF, if known. Default value is calculated based on the range of the data sample.
`upperBound`	the upper bound of the PDF, if known. Default value is calculated based on the range of the data sample.
`target`	a value from 1 to 100 representing the desired confidence percentage for the estimate score. The default of 70% represents the most likely score based on empirical simulations. A lower value may smooth estimates. A higher value tends to overfit to the sample and is not recommended.
`lagrangeMin`	minimum number of lagrange multipliers
`lagrangeMax`	maximum number of lagrange multipliers
`debug`	verbose output printed to console
`outlierCutoff`	outliers are automatically detected and removed according to the formula: < Q1 - outlierCutoff * IQR; or > Q3 + outlierCutoff * IQR, where Q1, Q3, and IQR represent the first quartile, third quartile, and inter-quartile range, respectively. Setting outlierCutoff = 0 turns off outlier detection.
`smooth`	minimizes noise in estimates, particularly in areas of low data density

Details

A nonparametric density estimator based on the maximum-entropy method. Accurately predicts a probability density function (PDF) for random data using a novel iterative scoring function to determine the best fit without overfitting to the sample.

Value

`failedSolution`	returns true if the pdf calculated is not considered an acceptable estimate of the data according to the scoring function.
`threshold`	represents the quality of the solution returned. Values of 40 to 70 indicate high confidence in the estimate. Values less than 5 are considered to be of poor quality. For more information on scoring see the referenced publication.
`x`	estimated range of density data
`pdf`	estimated probability density function
`cdf`	estimated cummulative density function
`sqr`	scaled quantile residual. Provides a sample-size invariant measure of the fluctuations in the estimate.
`sqrSize`	length of the returned scaled quantile residual. In most cases, this is the size of the input sample. Exceptions are if outliers are detected and/or if the failedSolution flag is true.
`lagrange`	values of lagrange multipliers. Can be used to reproduce the expansions for an analytical solution.
`r`	inverse of cdf for the sample.

Author(s)

Jenny Farmer, Donald Jacobs

References

Farmer, J. and D. Jacobs (2018). "High throughput nonparametric probability density estimation." PLoS One 13(5): e0196937.

Examples

#Estimates a normal distribution with 1000 sample points using default parameters

sampleSize = 1000
sample = rnorm(sampleSize, 0, 1)
dist = estimatePDF(sample)

[Package PDFEstimator version 4.5 Index]