kedd-package {kedd} | R Documentation |
Kernel Estimator and Bandwidth Selection for Density and Its Derivatives
Description
Smoothing techniques and computing bandwidth selectors of the r'th derivative of a probability density for one-dimensional data.
Details
Package: | kedd |
Type: | Package |
Version: | 1.0.4 |
Date: | 2024-01-27 |
License: | GPL (>= 2) |
There are four main types of functions in this package:
Compute the derivatives and convolutions of a kernel function (1-d).
Compute the kernel estimators for density and its derivatives (1-d).
Computing the bandwidth selectors (1-d).
Displaying kernel estimators.
Main Features
Convolutions and derivatives in kernel function:
In non-parametric statistics, a kernel is a weighting function used in non-parametric estimation techniques.
The kernels functions K(x)
are used in derivatives of kernel density estimator to estimate
\hat{f}^{(r)}_{h}(x)
, satisfying the following three requirements:
-
\int_{R} K(x) dx = 1
-
\int_{R} xK(x) dx = 0
-
\mu_{2}(K) = \int_{R}x^{2} K(x) dx < \infty
Several types of kernel functions K(x)
are commonly used in this package: Gaussian, Epanechnikov, Uniform (rectangular), Triangular,
Triweight, Tricube, Biweight (quartic), Cosine.
The function kernel.fun
for kernel derivative K^{(r)}(x)
and kernel.conv
for
kernel convolution K^{(r)}\ast K^{(r)} (x)
, where the write formally:
K^{(r)}(x) = \frac{d^{r}}{d x^{r}} K(x)
K^{(r)} \ast K^{(r)} (x) = \int_{-\infty}^{+\infty} K^{(r)}(y)K^{(r)}(x-y)dy
for r = 0, 1, 2, \dots
Estimators of r'th derivative of a density function:
A natural estimator of the r'th derivative of a density function f(x)
is:
\hat{f}^{(r)}_{h}(x)= \frac{d^{r}}{d x^{r}} \frac{1}{nh} \sum_{i=1}^{n} K\left(\frac{x-X_{i}}{h}\right) =
\frac{1}{nh^{r+1}}\sum_{i=1}^{n} K^{(r)}\left(\frac{x-X_{i}}{h}\right)
Here, X_{1}, X_{2}, \dots,X_{n}
is an i.i.d, sample of size n
from the distribution with density
f(x)
, K(x)
is the kernel function which we take to be a symmetric probability density with
at least r
non zero derivatives when estimating f^{(r)}(x)
, and h
is the bandwidth,
this parameter is very important that controls the degree of smoothing applied to the data.
The case (r=0)
is the standard kernel density estimator (e.g. Silverman 1986, Wolfgang 1991, Scott 1992,
Wand and Jones 1995, Jeffrey 1996, Bowman and Azzalini 1997, Alexandre 2009), properties of such derivative
estimators are well known e.g. Sheather and Jones (1991), Jones and Kappenman (1991), Wolfgang (1991). For
the case (r > 0)
, is derivative of kernel density estimator (e.g. Bhattacharya 1967, Schuster 1969, Alekseev 1972,
Wolfgang et all 1990, Jones 1992, Stoker 1993) and for applications which require the estimation of density derivatives can
be found in Singh (1977).
For r'th derivatives of kernel density estimator one-dimensional, the main function is dkde
. For display,
its plot method calls plot.dkde
, and if to add a plot using lines.dkde
.
R> data(trimodal) R> dkde(x = trimodal, deriv.order = 0, kernel = "gaussian") Data: trimodal (200 obs.); Kernel: gaussian Derivative order: 0; Bandwidth 'h' = 0.1007 eval.points est.fx Min. :-2.91274 Min. :0.0000066 1st Qu.:-1.46519 1st Qu.:0.0669750 Median :-0.01765 Median :0.1682045 Mean :-0.01765 Mean :0.1723692 3rd Qu.: 1.42989 3rd Qu.:0.2484626 Max. : 2.87743 Max. :0.4157340 R> dkde(x = trimodal, deriv.order = 1, kernel = "gaussian") Data: trimodal (200 obs.); Kernel: gaussian Derivative order: 1; Bandwidth 'h' = 0.09094 eval.points est.fx Min. :-2.87358 Min. :-1.740447 1st Qu.:-1.44562 1st Qu.:-0.343952 Median :-0.01765 Median : 0.009057 Mean :-0.01765 Mean : 0.000000 3rd Qu.: 1.41031 3rd Qu.: 0.415343 Max. : 2.83828 Max. : 1.256891
Bandwidth selectors:
The most important factor in the r'th derivative kernel density estimate is a choice of the bandwidth
h
for one-dimensional observations. Because of its role in controlling both the amount and
the direction of smoothing, this choice is particularly important. We present the popular bandwidth
selection (for more details see references) methods in this package:
Optimal Bandwidth (AMISE); with
deriv.order >= 0
, name of this function ish.amise
.
For display, its plot method callsplot.h.amise
, and to add a plot usedlines.h.amise
.Maximum-likelihood cross-validation (MLCV); with
deriv.order = 0
, name of this function ish.mlcv
.
For display, its plot method callsplot.h.mlcv
, and to add a plot usedlines.h.mlcv
.Unbiased cross validation (UCV); with
deriv.order >= 0
, name of this function ish.ucv
.
For display, its plot method callsplot.h.ucv
, and to add a plot usedlines.h.ucv
.Biased cross validation (BCV); with
deriv.order >= 0
, name of this function ish.bcv
.
For display, its plot method callsplot.h.bcv
, and to add a plot usedlines.h.bcv
.Complete cross-validation (CCV); with
deriv.order >= 0
, name of this function ish.ccv
.
For display, its plot method callsplot.h.ccv
, and to add a plot usedlines.h.ccv
.Modified cross-validation (MCV); with
deriv.order >= 0
, name of this function ish.mcv
.
For display, its plot method callsplot.h.mcv
, and to add a plot usedlines.h.mcv
.Trimmed cross-validation (TCV); with
deriv.order >= 0
, name of this function ish.tcv
.
For display, its plot method callsplot.h.tcv
, and to add a plot usedlines.h.tcv
.
R> data(trimodal) R> h.bcv(x = trimodal, whichbcv = 1, deriv.order = 0, kernel = "gaussian") Call: Biased Cross-Validation 1 Derivative order = 0 Data: trimodal (200 obs.); Kernel: gaussian Min BCV = 0.004511636; Bandwidth 'h' = 0.4357812 R> h.ccv(x = trimodal, deriv.order = 1, kernel = "gaussian") Call: Complete Cross-Validation Derivative order = 1 Data: trimodal (200 obs.); Kernel: gaussian Min CCV = 0.01985078; Bandwidth 'h' = 0.5828336 R> h.tcv(x = trimodal, deriv.order = 2, kernel = "gaussian") Call: Trimmed Cross-Validation Derivative order = 2 Data: trimodal (200 obs.); Kernel: gaussian Min TCV = -295.563; Bandwidth 'h' = 0.08908582 R> h.ucv(x = trimodal, deriv.order = 3, kernel = "gaussian") Call: Unbiased Cross-Validation Derivative order = 3 Data: trimodal (200 obs.); Kernel: gaussian Min UCV = -63165.18; Bandwidth 'h' = 0.1067236
For an overview of this package, see vignette("kedd")
.
Requirements
R version >= 2.15.0
Licence
This package and its documentation are usable under the terms of the "GNU General Public License", a copy of which is distributed with the package.
References
Alekseev, V. G. (1972). Estimation of a probability density function and its derivatives. Mathematical notes of the Academy of Sciences of the USSR. 12(5), 808–811.
Alexandre, B. T. (2009). Introduction to Nonparametric Estimation. Springer-Verlag, New York.
Bowman, A. W. (1984). An alternative method of cross-validation for the smoothing of kernel density estimates. Biometrika, 71, 353–360.
Bowman, A. W. and Azzalini, A. (1997). Applied Smoothing Techniques for Data Analysis: the Kernel Approach with S-Plus Illustrations. Oxford University Press, Oxford.
Bowman, A.W. and Azzalini, A. (2003). Computational aspects of nonparametric smoothing with illustrations from the sm library. Computational Statistics and Data Analysis, 42, 545–560.
Bowman, A.W. and Azzalini, A. (2013). sm: Smoothing methods for nonparametric regression and density estimation. R package version 2.2-5.3. Ported to R by B. D. Ripley.
Bhattacharya, P. K. (1967). Estimation of a probability density function and Its derivatives. Sankhya: The Indian Journal of Statistics, Series A, 29, 373–382.
Duin, R. P. W. (1976). On the choice of smoothing parameters of Parzen estimators of probability density functions. IEEE Transactions on Computers, C-25, 1175–1179.
Feluch, W. and Koronacki, J. (1992). A note on modified cross-validation in density estimation. Computational Statistics and Data Analysis, 13, 143–151.
George, R. T. (1990). The maximal smoothing principle in density estimation. Journal of the American Statistical Association, 85, 470–477.
George, R. T. and Scott, D. W. (1985). Oversmoothed nonparametric density estimates. Journal of the American Statistical Association, 80, 209–214.
Habbema, J. D. F., Hermans, J., and Van den Broek, K. (1974) A stepwise discrimination analysis program using density estimation. Compstat 1974: Proceedings in Computational Statistics. Physica Verlag, Vienna.
Heidenreich, N. B., Schindler, A. and Sperlich, S. (2013). Bandwidth selection for kernel density estimation: a review of fully automatic selectors. Advances in Statistical Analysis.
Jeffrey, S. S. (1996). Smoothing Methods in Statistics. Springer-Verlag, New York.
Jones, M. C. (1992). Differences and derivatives in kernel estimation. Metrika, 39, 335–340.
Jones, M. C., Marron, J. S. and Sheather,S. J. (1996). A brief survey of bandwidth selection for density estimation. Journal of the American Statistical Association, 91, 401–407.
Jones, M. C. and Kappenman, R. F. (1991). On a class of kernel density estimate bandwidth selectors. Scandinavian Journal of Statistics, 19, 337–349.
Loader, C. (1999). Local Regression and Likelihood. Springer, New York.
Olver, F. W., Lozier, D. W., Boisvert, R. F. and Clark, C. W. (2010). NIST Handbook of Mathematical Functions. Cambridge University Press, New York, USA.
Peter, H. and Marron, J.S. (1987). Estimation of integrated squared density derivatives. Statistics and Probability Letters, 6, 109–115.
Peter, H. and Marron, J.S. (1991). Local minima in cross-validation functions. Journal of the Royal Statistical Society, Series B, 53, 245–252.
Radhey, S. S. (1987). MISE of kernel estimates of a density and its derivatives. Statistics and Probability Letters, 5, 153–159.
Rudemo, M. (1982). Empirical choice of histograms and kernel density estimators. Scandinavian Journal of Statistics, 9, 65–78.
Scott, D. W. (1992). Multivariate Density Estimation. Theory, Practice and Visualization. New York: Wiley.
Scott, D.W. and George, R. T. (1987). Biased and unbiased cross-validation in density estimation. Journal of the American Statistical Association, 82, 1131–1146.
Schuster, E. F. (1969) Estimation of a probability density function and its derivatives. The Annals of Mathematical Statistics, 40 (4), 1187–1195.
Sheather, S. J. (2004). Density estimation. Statistical Science, 19, 588–597.
Sheather, S. J. and Jones, M. C. (1991). A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society, Series B, 53, 683–690.
Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC. London.
Singh, R. S. (1977). Applications of estimators of a density and its derivatives to certain statistical problems. Journal of the Royal Statistical Society, Series B, 39(3), 357–363.
Stoker, T. M. (1993). Smoothing bias in density derivative estimation. Journal of the American Statistical Association, 88, 855–863.
Stute, W. (1992). Modified cross validation in density estimation. Journal of Statistical Planning and Inference, 30, 293–305.
Tarn, D. (2007). ks: Kernel density estimation and kernel discriminant analysis for multivariate data in R. Journal of Statistical Software, 21(7), 1–16.
Tristen, H. and Jeffrey, S. R. (2008). Nonparametric Econometrics: The np Package. Journal of Statistical Software,27(5).
Venables, W. N. and Ripley, B. D. (2002). Modern Applied Statistics with S. New York: Springer.
Wand, M. P. and Jones, M. C. (1995). Kernel Smoothing. Chapman and Hall, London.
Wand, M.P. and Ripley, B. D. (2013). KernSmooth: Functions for Kernel Smoothing for Wand and Jones (1995). R package version 2.23-10.
Wolfgang, H. (1991). Smoothing Techniques, With Implementation in S. Springer-Verlag, New York.
Wolfgang, H., Marlene, M., Stefan, S. and Axel, W. (2004). Nonparametric and Semiparametric Models. Springer-Verlag, Berlin Heidelberg.
Wolfgang, H., Marron, J. S. and Wand, M. P. (1990). Bandwidth choice for density derivatives. Journal of the Royal Statistical Society, Series B, 223–232.
See Also
ks, KernSmooth, sm, np, locfit, feature, GenKern.