np.cv {PLRModels} | R Documentation |
Cross-validation bandwidth selection in nonparametric regression models
Description
From a sample {(Y_i, t_i): i=1,...,n}
, this routine computes, for each l_n
considered, an optimal bandwidth for estimating m
in the regression model
Y_i= m(t_i) + \epsilon_i.
The regression function, m
, is a smooth but unknown function, and the random errors, {\epsilon_i}
, are allowed to be time series. The optimal bandwidth is selected by means of the leave-(2l_n + 1
)-out cross-validation procedure. Kernel smoothing is used.
Usage
np.cv(data = data, h.seq = NULL, num.h = 50, w = NULL, num.ln = 1,
ln.0 = 0, step.ln = 2, estimator = "NW", kernel = "quadratic")
Arguments
data |
|
h.seq |
sequence of considered bandwidths in the CV function. If |
num.h |
number of values used to build the sequence of considered bandwidths. If |
w |
support interval of the weigth function in the CV function. If |
num.ln |
number of values for |
ln.0 |
minimum value for |
step.ln |
distance between two consecutives values of |
estimator |
allows us the choice between “NW” (Nadaraya-Watson) or “LLP” (Local Linear Polynomial). The default is “NW”. |
kernel |
allows us the choice between “gaussian”, “quadratic” (Epanechnikov kernel), “triweight” or “uniform” kernel. The default is “quadratic”. |
Details
A weight function (specifically, the indicator function 1_{[w[1] , w[2]]}
) is introduced in the CV function to allow elimination (or at least significant reduction) of boundary effects from the estimate of m(t_i)
.
For more details, see Chu and Marron (1991).
Value
h.opt |
dataframe containing, for each |
CV.opt |
|
CV |
matrix containing the values of the CV function for each bandwidth and |
w |
support interval of the weigth function in the CV function. |
h.seq |
sequence of considered bandwidths in the CV function. |
Author(s)
German Aneiros Perez ganeiros@udc.es
Ana Lopez Cheda ana.lopez.cheda@udc.es
References
Chu, C-K and Marron, J.S. (1991) Comparison of two bandwidth selectors with dependent errors. The Annals of Statistics 19, 1906-1918.
See Also
Other related functions are: np.est
, np.gcv
, plrm.est
, plrm.gcv
and plrm.cv
.
Examples
# EXAMPLE 1: REAL DATA
data <- matrix(10,120,2)
data(barnacles1)
barnacles1 <- as.matrix(barnacles1)
data[,1] <- barnacles1[,1]
data <- diff(data, 12)
data[,2] <- 1:nrow(data)
aux <- np.cv(data, ln.0=1,step.ln=1, num.ln=2)
aux$h.opt
plot.ts(aux$CV)
par(mfrow=c(2,1))
plot(aux$h.seq,aux$CV[,1], xlab="h", ylab="CV", type="l", main="ln=1")
plot(aux$h.seq,aux$CV[,2], xlab="h", ylab="CV", type="l", main="ln=2")
# EXAMPLE 2: SIMULATED DATA
## Example 2a: independent data
set.seed(1234)
# We generate the data
n <- 100
t <- ((1:n)-0.5)/n
m <- function(t) {0.25*t*(1-t)}
f <- m(t)
epsilon <- rnorm(n, 0, 0.01)
y <- f + epsilon
data_ind <- matrix(c(y,t),nrow=100)
# We apply the function
a <-np.cv(data_ind)
a$CV.opt
CV <- a$CV
h <- a$h.seq
plot(h,CV,type="l")