estimatePDF {PDFEstimator} | R Documentation |
Nonparametric Density Estimation
Description
Estimates the probability density function for a data sample.
Usage
estimatePDF(sample, pdfLength = NULL, estimationPoints = NULL,
lowerBound = NULL, upperBound = NULL, target = 70, lagrangeMin = 1,
lagrangeMax = 200, debug = 0, outlierCutoff = 7, smooth = TRUE)
Arguments
sample |
the data sample from which to calculate the density estimate. If the sample has more than 1 column, the multivariate estimation function, estimatePDFmv(), is called instead. |
pdfLength |
the desired length of the estimate returned. Default value is calculated based on sample length. Overriding this calculation can increase or decrease the resolution of the estimate. |
estimationPoints |
a vector containing the points to estimate. If not specified, this is calculated automatically to span the entire sample data. |
lowerBound |
the lower bound of the PDF, if known. Default value is calculated based on the range of the data sample. |
upperBound |
the upper bound of the PDF, if known. Default value is calculated based on the range of the data sample. |
target |
a value from 1 to 100 representing the desired confidence percentage for the estimate score. The default of 70% represents the most likely score based on empirical simulations. A lower value may smooth estimates. A higher value tends to overfit to the sample and is not recommended. |
lagrangeMin |
minimum number of lagrange multipliers |
lagrangeMax |
maximum number of lagrange multipliers |
debug |
verbose output printed to console |
outlierCutoff |
outliers are automatically detected and removed according to the formula: < Q1 - outlierCutoff * IQR; or > Q3 + outlierCutoff * IQR, where Q1, Q3, and IQR represent the first quartile, third quartile, and inter-quartile range, respectively. Setting outlierCutoff = 0 turns off outlier detection. |
smooth |
minimizes noise in estimates, particularly in areas of low data density |
Details
A nonparametric density estimator based on the maximum-entropy method. Accurately predicts a probability density function (PDF) for random data using a novel iterative scoring function to determine the best fit without overfitting to the sample.
Value
failedSolution |
returns true if the pdf calculated is not considered an acceptable estimate of the data according to the scoring function. |
threshold |
represents the quality of the solution returned. Values of 40 to 70 indicate high confidence in the estimate. Values less than 5 are considered to be of poor quality. For more information on scoring see the referenced publication. |
x |
estimated range of density data |
pdf |
estimated probability density function |
cdf |
estimated cummulative density function |
sqr |
scaled quantile residual. Provides a sample-size invariant measure of the fluctuations in the estimate. |
sqrSize |
length of the returned scaled quantile residual. In most cases, this is the size of the input sample. Exceptions are if outliers are detected and/or if the failedSolution flag is true. |
lagrange |
values of lagrange multipliers. Can be used to reproduce the expansions for an analytical solution. |
r |
inverse of cdf for the sample. |
Author(s)
Jenny Farmer, Donald Jacobs
References
Farmer, J. and D. Jacobs (2018). "High throughput nonparametric probability density estimation." PLoS One 13(5): e0196937.
Examples
#Estimates a normal distribution with 1000 sample points using default parameters
sampleSize = 1000
sample = rnorm(sampleSize, 0, 1)
dist = estimatePDF(sample)