R: Goodness-of-Fit tests in nonparametric regression models

np.gof {PLRModels}

R Documentation

Goodness-of-Fit tests in nonparametric regression models

Description

This routine tests the equality of a nonparametric regression curve, m, and a given function, m_0, from a sample {(Y_i, t_i): i=1,...,n}, where:

Y_i= m(t_i) + \epsilon_i.

The unknown function m is smooth, fixed equally spaced design is considered, and the random errors, {\epsilon_i}, are allowed to be time series. The test statistic used for testing the null hypothesis, H0: m = m_0, derives from a Cramer-von-Mises-type functional distance between a nonparametric estimator of m and m_0.

Usage

np.gof(data = data, m0 = NULL, h.seq = NULL, w = NULL, 
estimator = "NW", kernel = "quadratic", time.series = FALSE, 
Tau.eps = NULL, h0 = NULL, lag.max = 50, p.max = 3, 
q.max = 3, ic = "BIC", num.lb = 10, alpha = 0.05)

Arguments

`data`	`data[, 1]` contains the values of the response variable, `Y`; `data[, 2]` contains the values of the explanatory variable, `t`.
`m0`	the considered function in the null hypothesis. If `NULL` (the default), the zero function is considered.
`h.seq`	the statistic test is performed using each bandwidth in the vector `h.seq`. If `NULL` (the default), 10 equidistant values between zero and a quarter of the range of `{t_i}` are considered.
`w`	support interval of the weigth function in the test statistic. If `NULL` (the default), `(q_{0.1}, q_{0.9})` is considered, where `q_p` denotes the quantile of order `p` of `{t_i}`.
`estimator`	allows us the choice between “NW” (Nadaraya-Watson) or “LLP” (Local Linear Polynomial). The default is “NW”.
`kernel`	allows us the choice between “gaussian”, “quadratic” (Epanechnikov kernel), “triweight” or “uniform” kernel. The default is “quadratic”.
`time.series`	it denotes whether the data are independent (FALSE) or if data is a time series (TRUE). The default is FALSE.
`Tau.eps`	it contains the sum of autocovariances associated to the random errors of the regression model. If NULL (the default), the function tries to estimate it: it fits an ARMA model (selected according to an information criterium) to the residuals from the fitted nonparametric regression model and, then, it obtains the sum of the autocovariances of such ARMA model.
`h0`	if `Tau.eps=NULL`, `h0` contains the pilot bandwidth used for obtaining the residuals to construct the default for `Tau.eps`. If `NULL` (the default), a quarter of the range of `{t_i}` is considered.
`lag.max`	if `Tau.eps=NULL`, `lag.max` contains the maximum delay used to construct the default for `Tau.eps`. The default is 50.
`p.max`	if `Tau.eps=NULL`, the ARMA model is selected between the models ARMA(p,q) with 0<=p<=`p.max` and 0<=q<=`q.max`. The default is 3.
`q.max`	if `Tau.eps=NULL`, the ARMA model is selected between the models ARMA(p,q) with 0<=p<=`p.max` and 0<=q<=`q.max`. The default is 3.
`ic`	if `Tau.eps=NULL`, `ic` contains the information criterion used to suggest the ARMA model. It allows us to choose between: "AIC", "AICC" or "BIC" (the default).
`num.lb`	if `Tau.eps=NULL`, it checks the suitability of the selected ARMA model according to the Ljung-Box test and the t-test. It uses up to `num.lb` delays in the Ljung-Box test. The default is 10.
`alpha`	if `Tau.eps=NULL`, `alpha` contains the significance level which the ARMA model is checked. The default is 0.05.

Details

A weight function (specifically, the indicator function 1_{[w[1] , w[2]]}) is introduced in the test statistic to allow elimination (or at least significant reduction) of boundary effects from the estimate of m(t_i).

If Tau.eps=NULL and the routine is not able to suggest an approximation for Tau.eps, it warns the user with a message saying that the model could be not appropriate and then it shows the results. In order to construct Tau.eps, the procedures suggested in Muller and Stadmuller (1988) and Herrmann et al. (1992) can be followed.

The implemented statistic test particularizes that one in Gonzalez Manteiga and Vilar Fernandez (1995) to the case where the considered class in the null hypothesis has only one element.

Value

A list with a dataframe containing:

`h.seq`	sequence of bandwidths used in the test statistic.
`Q.m`	values of the test statistic (one for each bandwidth in `h.seq`).
`Q.m.normalised`	normalised value of Q.m.
`p.value`	p-values of the corresponding statistic tests (one for each bandwidth in `h.seq`).

Moreover, if data is a time series and Tau.eps is not especified:

`pv.Box.test`	p-values of the Ljung-Box test for the model fitted to the residuals.
`pv.t.test`	p-values of the t.test for the model fitted to the residuals.
`ar.ma`	ARMA orders for the model fitted to the residuals.

Author(s)

German Aneiros Perez ganeiros@udc.es

Ana Lopez Cheda ana.lopez.cheda@udc.es

References

Biedermann, S. and Dette, H. (2000) Testing linearity of regression models with dependent errors by kernel based methods. Test 9, 417-438.

Gonzalez-Manteiga, W. and Aneiros-Perez, G. (2003) Testing in partial linear regression models with dependent errors. J. Nonparametr. Statist. 15, 93-111.

Gonzalez-Manteiga, W. and Cao, R. (1993) Testing the hypothesis of a general linear model using nonparametric regression estimation. Test 2, 161-188.

Gonzalez Manteiga, W. and Vilar Fernandez, J. M. (1995) Testing linear regression models using non-parametric regression estimators when errors are non-independent. Comput. Statist. Data Anal. 20, 521-541.

Herrmann, E., Gasser, T. and Kneip, A. (1992) Choice of bandwidth for kernel regression when residuals are correlated. Biometrika 79, 783-795

Muller, H.G. and Stadmuller, U. (1988) Detecting dependencies in smooth regression models. Biometrika 75, 639-650

Examples

# EXAMPLE 1: REAL DATA
data <- matrix(10,120,2)
data(barnacles1)
barnacles1 <- as.matrix(barnacles1)
data[,1] <- barnacles1[,1]
data <- diff(data, 12)
data[,2] <- 1:nrow(data)

np.gof(data)



# EXAMPLE 2: SIMULATED DATA
## Example 2a: dependent data

set.seed(1234)
# We generate the data
n <- 100
t <- ((1:n)-0.5)/n
m <- function(t) {0.25*t*(1-t)}
f <- m(t)
f.function <- function(u) {0.25*u*(1-u)}

epsilon <- arima.sim(list(order = c(1,0,0), ar=0.7), sd = 0.01, n = n)
y <-  f + epsilon
data <- cbind(y,t)

## Example 2a.1: true null hypothesis
np.gof(data, m0=f.function, time.series=TRUE)

## Example 2a.2: false null hypothesis
np.gof(data, time.series=TRUE)

[Package PLRModels version 1.4 Index]