df.statistics {cvmgof} | R Documentation |
Global test statistic for the conditional distribution function
Description
This function computes the global test statistic for the conditional distribution function.
Usage
df.statistics(data.X, data.Y, cdf.H0, bandwidth,
kernel.function = kernel.function.epan, integration.step = 0.01)
Arguments
data.X |
a numeric data vector used to obtain the nonparametric estimator of the conditional distribution function. |
data.Y |
a numeric data vector used to obtain the nonparametric estimator of the conditional distribution function. |
cdf.H0 |
the conditional distribution function under the null hypothesis. |
bandwidth |
bandwidth used to obtain the nonparametric estimator of the conditional distribution function. |
kernel.function |
kernel function used to obtain the nonparametric estimator of the conditional distribution function. Default option is "kernel.function.epan". |
integration.step |
a numeric value specifying integration step. Default is |
Details
An inappropriate bandwidth choice can produce "NaN" values in cumulative distribution function estimates.
Author(s)
Romain Azais, Sandie Ferrigno and Marie-Jose Martinez
References
G. R. Ducharme and S. Ferrigno. An omnibus test of goodness-of-fit for conditional distributions with applications to regression models. Journal of Statistical Planning and Inference, 142, 2748:2761, 2012.
R. Azais, S. Ferrigno and M-J Martinez. cvmgof: An R package for Cramer-von Mises goodness-of-fit tests in regression models. Submitted. January 2021.hal-03101612
Examples
# Uncomment the following code block
#
# set.seed(1)
#
# # Data simulation
# n = 25 # Dataset size
# data.X = runif(n,min=0,max=5) # X
# data.Y = 0.2*data.X^2-data.X+2+rnorm(n,mean=0,sd=0.3) # Y
#
# ########################################################################
#
# # Bandwidth selection under H0
#
# # We want to test if the link function is f(x)=0.2*x^2-x+2
# # The answer is yes (see the definition of data.Y above)
# # We generate a dataset under H0 to estimate the optimal bandwidth under H0
#
# linkfunction.H0 = function(x){0.2*x^2-x+2}
#
# data.X.H0 = runif(n,min=0,max=5)
# data.Y.H0 = linkfunction.H0(data.X.H0)+rnorm(n,mean=0,sd=0.3)
#
# h.opt.df = df.bandwidth.selection.linkfunction(data.X.H0, data.Y.H0,linkfunction.H0)
#
# ########################################################################
#
# # Test statistics under H0
#
# cond_cdf.H0 = function(x,y)
# {
# out=matrix(0,nrow=length(x),ncol=length(y))
# for (i in 1:length(x)){
# x0=x[i]
# out[i,]=pnorm(y-linkfunction.H0(x0),0,0.3)
# }
# out
# }
# # cond_cdf.H0 is the conditional CDF associated with linkfunction.H0
# # with additive Gaussian noise (standard deviation=0.3)
#
# df.statistics(data.X,data.Y,cond_cdf.H0,h.opt.df)