R: Kernel Estimator and Bandwidth Selection for Density and Its...

kedd-package {kedd}

R Documentation

Kernel Estimator and Bandwidth Selection for Density and Its Derivatives

Description

Smoothing techniques and computing bandwidth selectors of the r'th derivative of a probability density for one-dimensional data.

Details

Package:	kedd
Type:	Package
Version:	1.0.4
Date:	2024-01-27
License:	GPL (>= 2)

There are four main types of functions in this package:

Compute the derivatives and convolutions of a kernel function (1-d).
Compute the kernel estimators for density and its derivatives (1-d).
Computing the bandwidth selectors (1-d).
Displaying kernel estimators.

Main Features

Convolutions and derivatives in kernel function:

In non-parametric statistics, a kernel is a weighting function used in non-parametric estimation techniques. The kernels functions K(x) are used in derivatives of kernel density estimator to estimate \hat{f}^{(r)}_{h}(x), satisfying the following three requirements:

\int_{R} K(x) dx = 1
\int_{R} xK(x) dx = 0
\mu_{2}(K) = \int_{R}x^{2} K(x) dx < \infty

Several types of kernel functions K(x) are commonly used in this package: Gaussian, Epanechnikov, Uniform (rectangular), Triangular, Triweight, Tricube, Biweight (quartic), Cosine.

The function kernel.fun for kernel derivative K^{(r)}(x) and kernel.conv for kernel convolution K^{(r)}\ast K^{(r)} (x), where the write formally:

K^{(r)}(x) = \frac{d^{r}}{d x^{r}} K(x)

K^{(r)} \ast K^{(r)} (x) = \int_{-\infty}^{+\infty} K^{(r)}(y)K^{(r)}(x-y)dy

for r = 0, 1, 2, \dots

Estimators of r'th derivative of a density function:

A natural estimator of the r'th derivative of a density function f(x) is:

\hat{f}^{(r)}_{h}(x)= \frac{d^{r}}{d x^{r}} \frac{1}{nh} \sum_{i=1}^{n} K\left(\frac{x-X_{i}}{h}\right) = \frac{1}{nh^{r+1}}\sum_{i=1}^{n} K^{(r)}\left(\frac{x-X_{i}}{h}\right)

Here, X_{1}, X_{2}, \dots,X_{n} is an i.i.d, sample of size n from the distribution with density f(x), K(x) is the kernel function which we take to be a symmetric probability density with at least r non zero derivatives when estimating f^{(r)}(x), and h is the bandwidth, this parameter is very important that controls the degree of smoothing applied to the data.

The case (r=0) is the standard kernel density estimator (e.g. Silverman 1986, Wolfgang 1991, Scott 1992, Wand and Jones 1995, Jeffrey 1996, Bowman and Azzalini 1997, Alexandre 2009), properties of such derivative estimators are well known e.g. Sheather and Jones (1991), Jones and Kappenman (1991), Wolfgang (1991). For the case (r > 0), is derivative of kernel density estimator (e.g. Bhattacharya 1967, Schuster 1969, Alekseev 1972, Wolfgang et all 1990, Jones 1992, Stoker 1993) and for applications which require the estimation of density derivatives can be found in Singh (1977).

For r'th derivatives of kernel density estimator one-dimensional, the main function is dkde. For display, its plot method calls plot.dkde, and if to add a plot using lines.dkde.

  R> data(trimodal)
  R> dkde(x = trimodal, deriv.order = 0, kernel = "gaussian")
   
    Data: trimodal (200 obs.);      Kernel: gaussian
    Derivative order: 0;    Bandwidth 'h' = 0.1007
          eval.points           est.fx         
    Min.   :-2.91274   Min.   :0.0000066  
    1st Qu.:-1.46519   1st Qu.:0.0669750  
    Median :-0.01765   Median :0.1682045  
    Mean   :-0.01765   Mean   :0.1723692  
    3rd Qu.: 1.42989   3rd Qu.:0.2484626  
    Max.   : 2.87743   Max.   :0.4157340 
   
  R> dkde(x = trimodal, deriv.order = 1, kernel = "gaussian")
  
    Data: trimodal (200 obs.);      Kernel: gaussian
    Derivative order: 1;    Bandwidth 'h' = 0.09094
          eval.points           est.fx         
    Min.   :-2.87358   Min.   :-1.740447  
    1st Qu.:-1.44562   1st Qu.:-0.343952  
    Median :-0.01765   Median : 0.009057  
    Mean   :-0.01765   Mean   : 0.000000  
    3rd Qu.: 1.41031   3rd Qu.: 0.415343  
    Max.   : 2.83828   Max.   : 1.256891

Bandwidth selectors:

The most important factor in the r'th derivative kernel density estimate is a choice of the bandwidth h for one-dimensional observations. Because of its role in controlling both the amount and the direction of smoothing, this choice is particularly important. We present the popular bandwidth selection (for more details see references) methods in this package:

Optimal Bandwidth (AMISE); with deriv.order >= 0, name of this function is h.amise.
For display, its plot method calls plot.h.amise, and to add a plot used lines.h.amise.
Maximum-likelihood cross-validation (MLCV); with deriv.order = 0, name of this function is h.mlcv.
For display, its plot method calls plot.h.mlcv, and to add a plot used lines.h.mlcv.
Unbiased cross validation (UCV); with deriv.order >= 0, name of this function is h.ucv.
For display, its plot method calls plot.h.ucv, and to add a plot used lines.h.ucv.
Biased cross validation (BCV); with deriv.order >= 0, name of this function is h.bcv.
For display, its plot method calls plot.h.bcv, and to add a plot used lines.h.bcv.
Complete cross-validation (CCV); with deriv.order >= 0, name of this function is h.ccv.
For display, its plot method calls plot.h.ccv, and to add a plot used lines.h.ccv.
Modified cross-validation (MCV); with deriv.order >= 0, name of this function is h.mcv.
For display, its plot method calls plot.h.mcv, and to add a plot used lines.h.mcv.
Trimmed cross-validation (TCV); with deriv.order >= 0, name of this function is h.tcv.
For display, its plot method calls plot.h.tcv, and to add a plot used lines.h.tcv.

  R> data(trimodal)
  R> h.bcv(x = trimodal, whichbcv = 1, deriv.order = 0, kernel = "gaussian")
  
    Call:           Biased Cross-Validation 1
    Derivative order = 0
    Data: trimodal (200 obs.);      Kernel: gaussian
    Min BCV = 0.004511636;  Bandwidth 'h' = 0.4357812 
	
  R> h.ccv(x = trimodal, deriv.order = 1, kernel = "gaussian")	
  
    Call:           Complete Cross-Validation
    Derivative order = 1 
    Data: trimodal (200 obs.);      Kernel: gaussian
    Min CCV = 0.01985078;   Bandwidth 'h' = 0.5828336
	
  R> h.tcv(x = trimodal, deriv.order = 2, kernel = "gaussian")
  
    Call:           Trimmed Cross-Validation
    Derivative order = 2
    Data: trimodal (200 obs.);      Kernel: gaussian
    Min TCV = -295.563;     Bandwidth 'h' = 0.08908582
	
  R> h.ucv(x = trimodal, deriv.order = 3, kernel = "gaussian")

    Call:           Unbiased Cross-Validation
    Derivative order = 3
    Data: trimodal (200 obs.);      Kernel: gaussian
    Min UCV = -63165.18;    Bandwidth 'h' = 0.1067236

For an overview of this package, see vignette("kedd").

Requirements

R version >= 2.15.0

Licence

This package and its documentation are usable under the terms of the "GNU General Public License", a copy of which is distributed with the package.

References

Alekseev, V. G. (1972). Estimation of a probability density function and its derivatives. Mathematical notes of the Academy of Sciences of the USSR. 12(5), 808–811.

Alexandre, B. T. (2009). Introduction to Nonparametric Estimation. Springer-Verlag, New York.

Bowman, A. W. (1984). An alternative method of cross-validation for the smoothing of kernel density estimates. Biometrika, 71, 353–360.

Bowman, A. W. and Azzalini, A. (1997). Applied Smoothing Techniques for Data Analysis: the Kernel Approach with S-Plus Illustrations. Oxford University Press, Oxford.

Bowman, A.W. and Azzalini, A. (2003). Computational aspects of nonparametric smoothing with illustrations from the sm library. Computational Statistics and Data Analysis, 42, 545–560.

Bowman, A.W. and Azzalini, A. (2013). sm: Smoothing methods for nonparametric regression and density estimation. R package version 2.2-5.3. Ported to R by B. D. Ripley.

Bhattacharya, P. K. (1967). Estimation of a probability density function and Its derivatives. Sankhya: The Indian Journal of Statistics, Series A, 29, 373–382.

Duin, R. P. W. (1976). On the choice of smoothing parameters of Parzen estimators of probability density functions. IEEE Transactions on Computers, C-25, 1175–1179.

Feluch, W. and Koronacki, J. (1992). A note on modified cross-validation in density estimation. Computational Statistics and Data Analysis, 13, 143–151.

George, R. T. (1990). The maximal smoothing principle in density estimation. Journal of the American Statistical Association, 85, 470–477.

George, R. T. and Scott, D. W. (1985). Oversmoothed nonparametric density estimates. Journal of the American Statistical Association, 80, 209–214.

Habbema, J. D. F., Hermans, J., and Van den Broek, K. (1974) A stepwise discrimination analysis program using density estimation. Compstat 1974: Proceedings in Computational Statistics. Physica Verlag, Vienna.

Heidenreich, N. B., Schindler, A. and Sperlich, S. (2013). Bandwidth selection for kernel density estimation: a review of fully automatic selectors. Advances in Statistical Analysis.

Jeffrey, S. S. (1996). Smoothing Methods in Statistics. Springer-Verlag, New York.

Jones, M. C. (1992). Differences and derivatives in kernel estimation. Metrika, 39, 335–340.

Jones, M. C., Marron, J. S. and Sheather,S. J. (1996). A brief survey of bandwidth selection for density estimation. Journal of the American Statistical Association, 91, 401–407.

Jones, M. C. and Kappenman, R. F. (1991). On a class of kernel density estimate bandwidth selectors. Scandinavian Journal of Statistics, 19, 337–349.

Loader, C. (1999). Local Regression and Likelihood. Springer, New York.

Olver, F. W., Lozier, D. W., Boisvert, R. F. and Clark, C. W. (2010). NIST Handbook of Mathematical Functions. Cambridge University Press, New York, USA.

Peter, H. and Marron, J.S. (1987). Estimation of integrated squared density derivatives. Statistics and Probability Letters, 6, 109–115.

Peter, H. and Marron, J.S. (1991). Local minima in cross-validation functions. Journal of the Royal Statistical Society, Series B, 53, 245–252.

Radhey, S. S. (1987). MISE of kernel estimates of a density and its derivatives. Statistics and Probability Letters, 5, 153–159.

Rudemo, M. (1982). Empirical choice of histograms and kernel density estimators. Scandinavian Journal of Statistics, 9, 65–78.

Scott, D. W. (1992). Multivariate Density Estimation. Theory, Practice and Visualization. New York: Wiley.

Scott, D.W. and George, R. T. (1987). Biased and unbiased cross-validation in density estimation. Journal of the American Statistical Association, 82, 1131–1146.

Schuster, E. F. (1969) Estimation of a probability density function and its derivatives. The Annals of Mathematical Statistics, 40 (4), 1187–1195.

Sheather, S. J. (2004). Density estimation. Statistical Science, 19, 588–597.

Sheather, S. J. and Jones, M. C. (1991). A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society, Series B, 53, 683–690.

Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC. London.

Singh, R. S. (1977). Applications of estimators of a density and its derivatives to certain statistical problems. Journal of the Royal Statistical Society, Series B, 39(3), 357–363.

Stoker, T. M. (1993). Smoothing bias in density derivative estimation. Journal of the American Statistical Association, 88, 855–863.

Stute, W. (1992). Modified cross validation in density estimation. Journal of Statistical Planning and Inference, 30, 293–305.

Tarn, D. (2007). ks: Kernel density estimation and kernel discriminant analysis for multivariate data in R. Journal of Statistical Software, 21(7), 1–16.

Tristen, H. and Jeffrey, S. R. (2008). Nonparametric Econometrics: The np Package. Journal of Statistical Software,27(5).

Venables, W. N. and Ripley, B. D. (2002). Modern Applied Statistics with S. New York: Springer.

Wand, M. P. and Jones, M. C. (1995). Kernel Smoothing. Chapman and Hall, London.

Wand, M.P. and Ripley, B. D. (2013). KernSmooth: Functions for Kernel Smoothing for Wand and Jones (1995). R package version 2.23-10.

Wolfgang, H. (1991). Smoothing Techniques, With Implementation in S. Springer-Verlag, New York.

Wolfgang, H., Marlene, M., Stefan, S. and Axel, W. (2004). Nonparametric and Semiparametric Models. Springer-Verlag, Berlin Heidelberg.

Wolfgang, H., Marron, J. S. and Wand, M. P. (1990). Bandwidth choice for density derivatives. Journal of the Royal Statistical Society, Series B, 223–232.