kedd-package {kedd}R Documentation

Kernel Estimator and Bandwidth Selection for Density and Its Derivatives


Smoothing techniques and computing bandwidth selectors of the r'th derivative of a probability density for one-dimensional data.


Package: kedd
Type: Package
Version: 1.0.4
Date: 2024-01-27
License: GPL (>= 2)

There are four main types of functions in this package:

  1. Compute the derivatives and convolutions of a kernel function (1-d).

  2. Compute the kernel estimators for density and its derivatives (1-d).

  3. Computing the bandwidth selectors (1-d).

  4. Displaying kernel estimators.

Main Features

Convolutions and derivatives in kernel function:

In non-parametric statistics, a kernel is a weighting function used in non-parametric estimation techniques. The kernels functions K(x)K(x) are used in derivatives of kernel density estimator to estimate f^h(r)(x)\hat{f}^{(r)}_{h}(x), satisfying the following three requirements:

  1. RK(x)dx=1\int_{R} K(x) dx = 1

  2. RxK(x)dx=0\int_{R} xK(x) dx = 0

  3. μ2(K)=Rx2K(x)dx<\mu_{2}(K) = \int_{R}x^{2} K(x) dx < \infty

Several types of kernel functions K(x)K(x) are commonly used in this package: Gaussian, Epanechnikov, Uniform (rectangular), Triangular, Triweight, Tricube, Biweight (quartic), Cosine.

The function for kernel derivative K(r)(x)K^{(r)}(x) and kernel.conv for kernel convolution K(r)K(r)(x)K^{(r)}\ast K^{(r)} (x), where the write formally:

K(r)(x)=drdxrK(x)K^{(r)}(x) = \frac{d^{r}}{d x^{r}} K(x)

K(r)K(r)(x)=+K(r)(y)K(r)(xy)dyK^{(r)} \ast K^{(r)} (x) = \int_{-\infty}^{+\infty} K^{(r)}(y)K^{(r)}(x-y)dy

for r=0,1,2,r = 0, 1, 2, \dots

Estimators of r'th derivative of a density function:

A natural estimator of the r'th derivative of a density function f(x)f(x) is:

f^h(r)(x)=drdxr1nhi=1nK(xXih)=1nhr+1i=1nK(r)(xXih)\hat{f}^{(r)}_{h}(x)= \frac{d^{r}}{d x^{r}} \frac{1}{nh} \sum_{i=1}^{n} K\left(\frac{x-X_{i}}{h}\right) = \frac{1}{nh^{r+1}}\sum_{i=1}^{n} K^{(r)}\left(\frac{x-X_{i}}{h}\right)

Here, X1,X2,,XnX_{1}, X_{2}, \dots,X_{n} is an i.i.d, sample of size nn from the distribution with density f(x)f(x), K(x)K(x) is the kernel function which we take to be a symmetric probability density with at least rr non zero derivatives when estimating f(r)(x)f^{(r)}(x), and hh is the bandwidth, this parameter is very important that controls the degree of smoothing applied to the data.

The case (r=0)(r=0) is the standard kernel density estimator (e.g. Silverman 1986, Wolfgang 1991, Scott 1992, Wand and Jones 1995, Jeffrey 1996, Bowman and Azzalini 1997, Alexandre 2009), properties of such derivative estimators are well known e.g. Sheather and Jones (1991), Jones and Kappenman (1991), Wolfgang (1991). For the case (r>0)(r > 0), is derivative of kernel density estimator (e.g. Bhattacharya 1967, Schuster 1969, Alekseev 1972, Wolfgang et all 1990, Jones 1992, Stoker 1993) and for applications which require the estimation of density derivatives can be found in Singh (1977).

For r'th derivatives of kernel density estimator one-dimensional, the main function is dkde. For display, its plot method calls plot.dkde, and if to add a plot using lines.dkde.

  R> data(trimodal)
  R> dkde(x = trimodal, deriv.order = 0, kernel = "gaussian")
    Data: trimodal (200 obs.);      Kernel: gaussian
    Derivative order: 0;    Bandwidth 'h' = 0.1007
          eval.points           est.fx         
    Min.   :-2.91274   Min.   :0.0000066  
    1st Qu.:-1.46519   1st Qu.:0.0669750  
    Median :-0.01765   Median :0.1682045  
    Mean   :-0.01765   Mean   :0.1723692  
    3rd Qu.: 1.42989   3rd Qu.:0.2484626  
    Max.   : 2.87743   Max.   :0.4157340 
  R> dkde(x = trimodal, deriv.order = 1, kernel = "gaussian")
    Data: trimodal (200 obs.);      Kernel: gaussian
    Derivative order: 1;    Bandwidth 'h' = 0.09094
          eval.points           est.fx         
    Min.   :-2.87358   Min.   :-1.740447  
    1st Qu.:-1.44562   1st Qu.:-0.343952  
    Median :-0.01765   Median : 0.009057  
    Mean   :-0.01765   Mean   : 0.000000  
    3rd Qu.: 1.41031   3rd Qu.: 0.415343  
    Max.   : 2.83828   Max.   : 1.256891  

Bandwidth selectors:

The most important factor in the r'th derivative kernel density estimate is a choice of the bandwidth hh for one-dimensional observations. Because of its role in controlling both the amount and the direction of smoothing, this choice is particularly important. We present the popular bandwidth selection (for more details see references) methods in this package:

  R> data(trimodal)
  R> h.bcv(x = trimodal, whichbcv = 1, deriv.order = 0, kernel = "gaussian")
    Call:           Biased Cross-Validation 1
    Derivative order = 0
    Data: trimodal (200 obs.);      Kernel: gaussian
    Min BCV = 0.004511636;  Bandwidth 'h' = 0.4357812 
  R> h.ccv(x = trimodal, deriv.order = 1, kernel = "gaussian")	
    Call:           Complete Cross-Validation
    Derivative order = 1 
    Data: trimodal (200 obs.);      Kernel: gaussian
    Min CCV = 0.01985078;   Bandwidth 'h' = 0.5828336
  R> h.tcv(x = trimodal, deriv.order = 2, kernel = "gaussian")
    Call:           Trimmed Cross-Validation
    Derivative order = 2
    Data: trimodal (200 obs.);      Kernel: gaussian
    Min TCV = -295.563;     Bandwidth 'h' = 0.08908582
  R> h.ucv(x = trimodal, deriv.order = 3, kernel = "gaussian")

    Call:           Unbiased Cross-Validation
    Derivative order = 3
    Data: trimodal (200 obs.);      Kernel: gaussian
    Min UCV = -63165.18;    Bandwidth 'h' = 0.1067236  

For an overview of this package, see vignette("kedd").


R version >= 2.15.0


This package and its documentation are usable under the terms of the "GNU General Public License", a copy of which is distributed with the package.


