h.ucv {kedd} | R Documentation |
Unbiased (Least-Squares) Cross-Validation for Bandwidth Selection
Description
The (S3) generic function h.ucv
computes the unbiased
(least-squares) cross-validation bandwidth selector
of r'th derivative of kernel density estimator one-dimensional.
Usage
h.ucv(x, ...)
## Default S3 method:
h.ucv(x, deriv.order = 0, lower = 0.1 * hos, upper = 2 * hos,
tol = 0.1 * lower, kernel = c("gaussian", "epanechnikov", "uniform",
"triangular", "triweight", "tricube", "biweight", "cosine"), ...)
Arguments
x |
vector of data values. |
deriv.order |
derivative order (scalar). |
lower , upper |
range over which to minimize. The default is
almost always satisfactory. |
tol |
the convergence tolerance for |
kernel |
a character string giving the smoothing kernel to be used, with default
|
... |
further arguments for (non-default) methods. |
Details
h.ucv
unbiased (least-squares) cross-validation implements for choosing the bandwidth h
of a r'th derivative kernel density estimator.
Rudemo (1982) and Bowman (1984) proposed a so-called unbiased (least-squares) cross-validation
(UCV) in kernel density estimator. An adaptation of unbiased cross-validation is proposed by
Wolfgang et al. (1990) for bandwidth choice in the r'th derivative of kernel density estimator.
The essential idea of this methods, for the estimation of f^{(r)}(x)
(r
is derivative order),
is to use the bandwidth h
which minimizes the function:
UCV(h;r) = \int \left(\hat{f}_{h}^{(r)}(x)\right)^{2} - 2n^{-1}(-1)^{r}\sum_{i=1}^{n} \hat{f}_{h,i}^{(2r)}(X_{i})
The bandwidth minimizing this function is:
\hat{h}^{(r)}_{ucv} = \arg \min_{h^{(r)}} UCV(h;r)
for r = 0, 1, 2, \dots
where
\int \left(\hat{f}_{h}^{(r)}(x)\right)^{2} = \frac{R\left(K^{(r)}\right)}{nh^{2r+1}} + \frac{(-1)^{r}}{n (n-1) h^{2r+1}} \sum_{i=1}^{n}\sum_{j=1;j \neq i}^{n} K^{(r)} \ast K^{(r)} \left(\frac{X_{j}-X_{i}}{h}\right)
and K^{(r)} \ast K^{(r)} (x)
is the convolution of the r'th derivative kernel function K^{(r)}(x)
(see kernel.conv
and kernel.fun
).
The estimate \hat{f}_{h,i}^{(2r)}(x)
on the subset \{X_{j}\}_{j \neq i}
denoting the leave-one-out estimator, can be written:
\hat{f}_{h,i}^{(2r)}(X_{i}) = \frac{1}{(n-1) h^{2r+1}} \sum_{j \neq i} K^{(2r)} \left(\frac{X_{j}-X_{i}}{h}\right)
The function UCV(h;r)
is unbiased cross-validation in the sense that E[UCV]=MISE[\hat{f}_{h}^{(r)}(x)]-R(f^{(r)}(x))
(see, Scott and George 1987). Can be simplified to give the computationally:
UCV(h;r) = \frac{R\left(K^{(r)}\right)}{nh^{2r+1}} + \frac{(-1)^{r}}{n (n-1) h^{2r+1}} \sum_{i=1}^{n}\sum_{j=1 ;j \neq i}^{n} \left(K^{(r)} \ast K^{(r)} -2K^{(2r)}\right) \left(\frac{X_{j}-X_{i}}{h}\right)
where R\left(K^{(r)}\right) = \int_{R} K^{(r)}(x)^{2} dx
.
The range over which to minimize is hos
Oversmoothing bandwidth, the default is almost always
satisfactory. See George and Scott (1985), George (1990), Scott (1992, pp 165), Wand and Jones (1995, pp 61).
Value
x |
data points - same as input. |
data.name |
the deparsed name of the |
n |
the sample size after elimination of missing values. |
kernel |
name of kernel to use |
deriv.order |
the derivative order to use. |
h |
value of bandwidth parameter. |
min.ucv |
the minimal UCV value. |
Author(s)
Arsalane Chouaib Guidoum acguidoum@usthb.dz
References
Bowman, A. (1984). An alternative method of cross-validation for the smoothing of kernel density estimates. Biometrika, 71, 353–360.
Jones, M. C. and Kappenman, R. F. (1991). On a class of kernel density estimate bandwidth selectors. Scandinavian Journal of Statistics, 19, 337–349.
Jones, M. C., Marron, J. S. and Sheather,S. J. (1996). A brief survey of bandwidth selection for density estimation. Journal of the American Statistical Association, 91, 401–407.
Peter, H. and Marron, J.S. (1987). Estimation of integrated squared density derivatives. Statistics and Probability Letters, 6, 109–115.
Rudemo, M. (1982). Empirical choice of histograms and kernel density estimators. Scandinavian Journal of Statistics, 9, 65–78.
Scott, D.W. and George, R. T. (1987). Biased and unbiased cross-validation in density estimation. Journal of the American Statistical Association, 82, 1131–1146.
Sheather, S. J. (2004). Density estimation. Statistical Science, 19, 588–597.
Tarn, D. (2007). ks: Kernel density estimation and kernel discriminant analysis for multivariate data in R. Journal of Statistical Software, 21(7), 1–16.
Wand, M. P. and Jones, M. C. (1995). Kernel Smoothing. Chapman and Hall, London.
Wolfgang, H. (1991). Smoothing Techniques, With Implementation in S. Springer-Verlag, New York.
Wolfgang, H., Marron, J. S. and Wand, M. P. (1990). Bandwidth choice for density derivatives. Journal of the Royal Statistical Society, Series B, 223–232.
See Also
plot.h.ucv
, see bw.ucv
in package "stats" and
ucv
in package MASS for Gaussian kernel only if deriv.order = 0
,
hlscv
in package ks for Gaussian kernel only if 0 <= deriv.order <= 5
,
kdeb
in package locfit if deriv.order = 0
.
Examples
## Derivative order = 0
h.ucv(kurtotic,deriv.order = 0)
## Derivative order = 1
h.ucv(kurtotic,deriv.order = 1)