estimatePDF {PDFEstimator}R Documentation

Nonparametric Density Estimation

Description

Estimates the probability density function for a data sample.

Usage

estimatePDF(sample, pdfLength = NULL, estimationPoints = NULL, 
lowerBound = NULL, upperBound = NULL, target = 70, lagrangeMin = 1, 
lagrangeMax = 200, debug = 0, outlierCutoff = 7, smooth = TRUE)

Arguments

sample

the data sample from which to calculate the density estimate. If the sample has more than 1 column, the multivariate estimation function, estimatePDFmv(), is called instead.

pdfLength

the desired length of the estimate returned. Default value is calculated based on sample length. Overriding this calculation can increase or decrease the resolution of the estimate.

estimationPoints

a vector containing the points to estimate. If not specified, this is calculated automatically to span the entire sample data.

lowerBound

the lower bound of the PDF, if known. Default value is calculated based on the range of the data sample.

upperBound

the upper bound of the PDF, if known. Default value is calculated based on the range of the data sample.

target

a value from 1 to 100 representing the desired confidence percentage for the estimate score. The default of 70% represents the most likely score based on empirical simulations. A lower value may smooth estimates. A higher value tends to overfit to the sample and is not recommended.

lagrangeMin

minimum number of lagrange multipliers

lagrangeMax

maximum number of lagrange multipliers

debug

verbose output printed to console

outlierCutoff

outliers are automatically detected and removed according to the formula: < Q1 - outlierCutoff * IQR; or > Q3 + outlierCutoff * IQR, where Q1, Q3, and IQR represent the first quartile, third quartile, and inter-quartile range, respectively. Setting outlierCutoff = 0 turns off outlier detection.

smooth

minimizes noise in estimates, particularly in areas of low data density

Details

A nonparametric density estimator based on the maximum-entropy method. Accurately predicts a probability density function (PDF) for random data using a novel iterative scoring function to determine the best fit without overfitting to the sample.

Value

failedSolution

returns true if the pdf calculated is not considered an acceptable estimate of the data according to the scoring function.

threshold

represents the quality of the solution returned. Values of 40 to 70 indicate high confidence in the estimate. Values less than 5 are considered to be of poor quality. For more information on scoring see the referenced publication.

x

estimated range of density data

pdf

estimated probability density function

cdf

estimated cummulative density function

sqr

scaled quantile residual. Provides a sample-size invariant measure of the fluctuations in the estimate.

sqrSize

length of the returned scaled quantile residual. In most cases, this is the size of the input sample. Exceptions are if outliers are detected and/or if the failedSolution flag is true.

lagrange

values of lagrange multipliers. Can be used to reproduce the expansions for an analytical solution.

r

inverse of cdf for the sample.

Author(s)

Jenny Farmer, Donald Jacobs

References

Farmer, J. and D. Jacobs (2018). "High throughput nonparametric probability density estimation." PLoS One 13(5): e0196937.

Examples

#Estimates a normal distribution with 1000 sample points using default parameters

sampleSize = 1000
sample = rnorm(sampleSize, 0, 1)
dist = estimatePDF(sample)


[Package PDFEstimator version 4.5 Index]