kde.local.test {ks}R Documentation

Kernel density based local two-sample comparison test

Description

Kernel density based local two-sample comparison test for 1- to 6-dimensional data.

Usage

kde.local.test(x1, x2, H1, H2, h1, h2, fhat1, fhat2, gridsize, binned, 
   bgridsize, verbose=FALSE, supp=3.7, mean.adj=FALSE, signif.level=0.05,
   min.ESS, xmin, xmax)

Arguments

x1, x2

vector/matrix of data values

H1, H2, h1, h2

bandwidth matrices/scalar bandwidths. If these are missing, Hpi or hpi is called by default.

fhat1, fhat2

objects of class kde

binned

flag for binned estimation

gridsize

vector of grid sizes

bgridsize

vector of binning grid sizes

verbose

flag to print out progress information. Default is FALSE.

supp

effective support for normal kernel

mean.adj

flag to compute second order correction for mean value of critical sampling distribution. Default is FALSE. Currently implemented for d<=2 only.

signif.level

significance level. Default is 0.05.

min.ESS

minimum effective sample size. See below for details.

xmin, xmax

vector of minimum/maximum values for grid

Details

The null hypothesis is H_0(\bold{x}): f_1(\bold{x}) = f_2(\bold{x}) where f_1, f_2 are the respective density functions. The measure of discrepancy is U(\bold{x}) = [f_1(\bold{x}) - f_2(\bold{x})]^2. Duong (2013) shows that the test statistic obtained, by substituting the KDEs for the true densities, has a null distribution which is asymptotically chi-squared with 1 d.f.

The required input is either x1,x2 and H1,H2, or fhat1,fhat2, i.e. the data values and bandwidths or objects of class kde. In the former case, the kde objects are created. If the H1,H2 are missing then the default are the plug-in selectors Hpi. Likewise for missing h1,h2.

The mean.adj flag determines whether the second order correction to the mean value of the test statistic should be computed. min.ESS is borrowed from Godtliebsen et al. (2002) to reduce spurious significant results in the tails, though by it is usually not required for small to moderate sample sizes.

Value

A kernel two-sample local significance is an object of class kde.loctest which is a list with fields:

fhat1, fhat2

kernel density estimates, objects of class kde

chisq

chi squared test statistic

pvalue

matrix of local p-values at each grid point

fhat.diff

difference of KDEs

mean.fhat.diff

mean of the test statistic

var.fhat.diff

variance of the test statistic

fhat.diff.pos

binary matrix to indicate locally significant fhat1 > fhat2

fhat.diff.neg

binary matrix to indicate locally significant fhat1 < fhat2

n1, n2

sample sizes

H1, H2, h1, h2

bandwidth matrices/scalar bandwidths

References

Duong, T. (2013) Local significant differences from non-parametric two-sample tests. Journal of Nonparametric Statistics, 25, 635-645.

Godtliebsen, F., Marron, J.S. & Chaudhuri, P. (2002) Significance in scale space for bivariate density estimation. Journal of Computational and Graphical Statistics, 11, 1-22.

See Also

kde.test, plot.kde.loctest

Examples

data(crabs, package="MASS")
x1 <- crabs[crabs$sp=="B", 4]
x2 <- crabs[crabs$sp=="O", 4]
loct <- kde.local.test(x1=x1, x2=x2)
plot(loct, ylim=c(-0.08,0.12))
cols <- hcl.colors(palette="Dark2",2)
plot(loct$fhat1, add=TRUE, col=cols[1])
plot(loct$fhat2, add=TRUE, col=cols[2])

## see examples in ? plot.kde.loctest

[Package ks version 1.14.2 Index]