plrm.cv {PLRModels} | R Documentation |
Cross-validation bandwidth selection in PLR models
Description
From a sample , this routine computes, for each
considered, an optimal pair of bandwidths for estimating the regression function of the model
where
is an unknown vector parameter and
is a smooth but unknown function.
The random errors, , are allowed to be time series. The optimal pair of bandwidths, (
b.opt, h.opt
), is selected by means of the leave-()-out cross-validation procedure. The bandwidth
b.opt
is used in the estimate of , while the pair of bandwidths
(b.opt, h.opt)
is considered in the estimate of . Kernel smoothing, combined with ordinary least squares estimation, is used.
Usage
plrm.cv(data = data, b.equal.h = TRUE, b.seq=NULL, h.seq=NULL,
num.b = NULL, num.h = NULL, w = NULL, num.ln = 1, ln.0 = 0,
step.ln = 2, estimator = "NW", kernel = "quadratic")
Arguments
data |
|
b.equal.h |
if TRUE (the default), the same bandwidth is used for estimating both |
b.seq |
sequence of considered bandwidths, |
h.seq |
sequence of considered bandwidths, |
num.b |
number of values used to build the sequence of considered bandwidths for estimating |
num.h |
pairs of bandwidths ( |
w |
support interval of the weigth function in the CV function. If |
num.ln |
number of values for |
ln.0 |
minimum value for |
step.ln |
distance between two consecutives values of |
estimator |
allows us the choice between “NW” (Nadaraya-Watson) or “LLP” (Local Linear Polynomial). The default is “NW”. |
kernel |
allows us the choice between “gaussian”, “quadratic” (Epanechnikov kernel), “triweight” or “uniform” kernel. The default is “quadratic”. |
Details
A weight function (specifically, the indicator function 1) is introduced in the CV function to allow elimination (or at least significant reduction) of boundary effects from the estimate of
.
As noted in the definition of num.ln
, the estimate of in the CV function is obtained from all data while, once
is estimated,
observations around each
are eliminated to estimate
in the CV function. Actually, the estimate of
to be used in time
in the CV function could be done eliminating such
observations too; that possibility was not implemented because both their computational cost and the known fact that the estimate of
is quite insensitive to the bandwidth selection.
The implemented procedure generalizes that one in expression (8) in Aneiros-Perez and Quintela-del-Rio (2001) by including a weight function (see above) and allowing two smoothing parameters instead of only one (see Aneiros-Perez et al., 2004).
Value
bh.opt |
dataframe containing, for each |
CV.opt |
|
CV |
an array containing the values of the CV function for each pair of bandwidths and |
b.seq |
sequence of considered bandwidths, |
h.seq |
sequence of considered bandwidths, |
w |
support interval of the weigth function in the CV function. |
Author(s)
German Aneiros Perez ganeiros@udc.es
Ana Lopez Cheda ana.lopez.cheda@udc.es
References
Aneiros-Perez, G., Gonzalez-Manteiga, W. and Vieu, P. (2004) Estimation and testing in a partial linear regression under long-memory dependence. Bernoulli 10, 49-78.
Aneiros-Perez, G. and Quintela-del-Rio, A. (2001) Modified cross-validation in semiparametric regression models with dependent errors. Comm. Statist. Theory Methods 30, 289-307.
Chu, C-K and Marron, J.S. (1991) Comparison of two bandwidth selectors with dependent errors. The Annals of Statistics 19, 1906-1918.
See Also
Other related functions are: plrm.beta
, plrm.est
, plrm.gcv
, np.est
, np.gcv
and np.cv
.
Examples
# EXAMPLE 1: REAL DATA
data(barnacles1)
data <- as.matrix(barnacles1)
data <- diff(data, 12)
data <- cbind(data,1:nrow(data))
aux <- plrm.cv(data, step.ln=1, num.ln=2)
aux$bh.opt
plot.ts(aux$CV[,-2,])
par(mfrow=c(2,1))
plot(aux$b.seq,aux$CV[,-2,1], xlab="h", ylab="CV", type="l", main="ln=0")
plot(aux$b.seq,aux$CV[,-2,2], xlab="h", ylab="CV", type="l", main="ln=1")
# EXAMPLE 2: SIMULATED DATA
## Example 2a: independent data
set.seed(1234)
# We generate the data
n <- 100
t <- ((1:n)-0.5)/n
beta <- c(0.05, 0.01)
m <- function(t) {0.25*t*(1-t)}
f <- m(t)
x <- matrix(rnorm(200,0,1), nrow=n)
sum <- x%*%beta
epsilon <- rnorm(n, 0, 0.01)
y <- sum + f + epsilon
data_ind <- matrix(c(y,x,t),nrow=100)
# We apply the function
a <-plrm.cv(data_ind)
a$CV.opt
CV <- a$CV
h <- a$h.seq
plot(h, CV,type="l")
## Example 2b: dependent data and ln.0 > 0
set.seed(1234)
# We generate the data
x <- matrix(rnorm(200,0,1), nrow=n)
sum <- x%*%beta
epsilon <- arima.sim(list(order = c(1,0,0), ar=0.7), sd = 0.01, n = n)
y <- sum + f + epsilon
data_dep <- matrix(c(y,x,t),nrow=100)
# We apply the function
a <-plrm.cv(data_dep, ln.0=2)
a$CV.opt
CV <- a$CV
h <- a$h.seq
plot(h, CV,type="l")