cv.select {SCBmeanfd}R Documentation

Cross-Validation Bandwidth Selection for Local Polynomial Estimation

Description

Select the cross-validation bandwidth described in Rice and Silverman (1991) for the local polynomial estimation of a mean function based on functional data.

Usage

 cv.select(x, y, degree = 1, interval = NULL, gridsize = length(x), ...) 

Arguments

x

observation points. Missing values are not accepted.

y

matrix or data frame with functional observations (= curves) stored in rows. The number of columns of y must match the length of x. Missing values are not accepted.

degree

degree of the local polynomial fit.

interval

lower and upper bounds of the search interval (numeric vector of length 2).

gridsize

size of evaluation grid for the smoothed data.

...

additional arguments to pass to the optimization function optimize.

Details

The cross-validation score is obtained by leaving in turn each curve out and computing the prediction error of the local polynomial smoother based on all other curves. For a bandwith value h, this score is

CV(h) = \frac{1}{np} \sum_{i=1}^n \sum_{j=1}^p \left( Y_{ij} - \hat{\mu}^{-(i)}(x_j;h) \right)^2,

where Y_{ij} is the measurement of the i-th curve at location x_j for i=1,\ldots,n and j=1,\ldots,p, and \hat{\mu}^{-(i)}(x_j;h) is the local polynomial estimator with bandwidth h based on all curves except the i-th.

If the x values are not equally spaced, the data are first smoothed and evaluated on a grid of length gridsize spanning the range of x. The smoothed data are then interpolated back to x.

cv.select uses the standard R function optimize to optimize cv.score. If the argument interval is not specified, the lower bound of the search interval is by default (x_2-x_1)/2 if degree < 2 and x_2-x_1 if degree >= 2. The default value of the upper bound is (\max(x)-\min(x))/2. These values guarantee in most cases that the local polynomial estimator is well defined. It is often useful to plot the function to be optimized for a range of argument values (grid search) before applying a numerical optimizer. In this way, the search interval can be narrowed down and the optimizer is more likely to find a global solution.

Value

a bandwidth that minimizes the cross-validation score.

References

Rice, J. A. and Silverman, B. W. (1991). Estimating the mean and covariance structure nonparametrically when the data are curves. Journal of the Royal Statistical Society. Series B (Methodological), 53, 233–243.

See Also

cv.score, plugin.select

Examples

## Not run: 
## Plasma citrate data
## Compare cross-validation scores and bandwidths  
## for local linear and local quadratic smoothing

data(plasma)
time <- 8:21   				
## Local linear smoothing						
cv.select(time, plasma, 1)	# local solution h = 3.76, S(h) = 463.08			
cv.select(time, plasma, 1, interval = c(.5, 1))	# global solution = .75, S(h) = 439.54

## Local quadratic smoothing						
cv.select(time, plasma, 2)	# global solution h = 1.15, S(h) = 432.75			
cv.select(time, plasma, 2, interval = c(1, 1.5))	# same

## End(Not run)

[Package SCBmeanfd version 1.2.2 Index]