R: Generalized cross-validation bandwidth selection in...

np.gcv {PLRModels}

R Documentation

Generalized cross-validation bandwidth selection in nonparametric regression models

Description

From a sample {(Y_i, t_i): i=1,...,n}, this routine computes an optimal bandwidth for estimating m in the regression model

Y_i= m(t_i) + \epsilon_i.

The regression function, m, is a smooth but unknown function. The optimal bandwidth is selected by means of the generalized cross-validation procedure. Kernel smoothing is used.

Usage

np.gcv(data = data, h.seq=NULL, num.h = 50, estimator = "NW", 
kernel = "quadratic")

Arguments

`data`	`data[, 1]` contains the values of the response variable, `Y`; `data[, 2]` contains the values of the explanatory variable, `t`.
`h.seq`	sequence of considered bandwidths in the GCV function. If `NULL` (the default), `num.h` equidistant values between zero and a quarter of the range of `t_i` are considered.
`num.h`	number of values used to build the sequence of considered bandwidths. If `h.seq` is not `NULL`, `num.h=length(h.seq)`. Otherwise, the default is 50.
`estimator`	allows us the choice between “NW” (Nadaraya-Watson) or “LLP” (Local Linear Polynomial). The default is “NW”.
`kernel`	allows us the choice between “gaussian”, “quadratic” (Epanechnikov kernel), “triweight” or “uniform” kernel. The default is “quadratic”.

Details

See Craven and Wahba (1979) and Rice (1984).

Value

`h.opt`	selected value for the bandwidth.
`GCV.opt`	minimum value of the GCV function.
`GCV`	vector containing the values of the GCV function for each considered bandwidth.
`h.seq`	sequence of considered bandwidths in the GCV function.

Author(s)

German Aneiros Perez ganeiros@udc.es

Ana Lopez Cheda ana.lopez.cheda@udc.es

References

Craven, P. and Wahba, G. (1979) Smoothing noisy data with spline functions. Numer. Math. 31, 377-403.

Rice, J. (1984) Bandwidth choice for nonparametric regression. Ann. Statist. 12, 1215-1230.

Examples

# EXAMPLE 1: REAL DATA
data <- matrix(10,120,2)
data(barnacles1)
barnacles1 <- as.matrix(barnacles1)
data[,1] <- barnacles1[,1]
data <- diff(data, 12)
data[,2] <- 1:nrow(data)

aux <- np.gcv(data)
aux$h.opt
plot(aux$h.seq, aux$GCV, xlab="h", ylab="GCV", type="l")



# EXAMPLE 2: SIMULATED DATA
## Example 2a: independent data

set.seed(1234)
# We generate the data
n <- 100
t <- ((1:n)-0.5)/n
m <- function(t) {0.25*t*(1-t)}
f <- m(t)

epsilon <- rnorm(n, 0, 0.01)
y <-  f + epsilon
data_ind <- matrix(c(y,t),nrow=100)

# We apply the function
a <-np.gcv(data_ind)
a$GCV.opt

GCV <- a$GCV
h <- a$h.seq
plot(h, GCV, type="l")

[Package PLRModels version 1.4 Index]