npunitest {np} | R Documentation |
Kernel Consistent Univariate Density Equality Test with Mixed Data Types
Description
npunitest
implements the consistent metric entropy test of
Maasoumi and Racine (2002) for two arbitrary, stationary
univariate nonparametric densities on common support.
Usage
npunitest(data.x = NULL,
data.y = NULL,
method = c("integration","summation"),
bootstrap = TRUE,
boot.num = 399,
bw.x = NULL,
bw.y = NULL,
random.seed = 42,
...)
Arguments
data.x , data.y |
common support univariate vectors containing the variables. |
method |
a character string used to specify whether to compute
the integral version or the summation version of the statistic. Can
be set as |
bootstrap |
a logical value which specifies whether to conduct the bootstrap
test or not. If set to |
boot.num |
an integer value specifying the number of bootstrap
replications to use. Defaults to |
bw.x , bw.y |
numeric (scalar) bandwidths. Defaults to plug-in (see details below). |
random.seed |
an integer used to seed R's random number generator. This is to ensure replicability. Defaults to 42. |
... |
additional arguments supplied to specify the bandwidth
type, kernel types, and so on. This is used since we specify bw as
a numeric scalar and not a |
Details
npunitest
computes the nonparametric metric entropy (normalized
Hellinger of Granger, Maasoumi and Racine (2004)) for testing
equality of two univariate density/probability functions,
D[f(x), f(y)]
. See Maasoumi and Racine (2002)
for details. Default bandwidths are of the plug-in variety
(bw.SJ
for continuous variables and direct plug-in for
discrete variables). The bootstrap is conducted via simple resampling
with replacement from the pooled data.x
and data.y
(data.x
only for summation
).
The summation version of this statistic can be numerically unstable
when data.x
and data.y
lack common support or when the
overlap is sparse (the summation version involves division of
densities while the integration version involves differences, and the
statistic in such cases can be reported as exactly 0.5 or 0). Warning
messages are produced when this occurs (‘integration recommended’)
and should be heeded.
Numerical integration can occasionally fail when the data.x
and data.y
distributions lack common support and/or lie an
extremely large distance from one another (the statistic in such
cases will be reported as exactly 0.5 or 0). However, in these
extreme cases, simple tests will reveal the obvious differences in
the distributions and entropy-based tests for equality will be
clearly unnecessary.
Value
npunitest
returns an object of type unitest
with the
following components
Srho |
the statistic |
Srho.bootstrap |
contains the bootstrap replications of |
P |
the P-value of the statistic |
boot.num |
number of bootstrap replications |
bw.x , bw.y |
scalar bandwidths for |
summary
supports object of type unitest
.
Usage Issues
See the example below for proper usage.
Author(s)
Tristen Hayfield tristen.hayfield@gmail.com, Jeffrey S. Racine racinej@mcmaster.ca
References
Granger, C.W. and E. Maasoumi and J.S. Racine (2004), “A dependence metric for possibly nonlinear processes”, Journal of Time Series Analysis, 25, 649-669.
Maasoumi, E. and J.S. Racine (2002), “Entropy and predictability of stock market returns,” Journal of Econometrics, 107, 2, pp 291-312.
See Also
npdeneqtest,npdeptest,npsdeptest,npsymtest
Examples
## Not run:
set.seed(1234)
n <- 1000
## Compute the statistic only for data drawn from same distribution
x <- rnorm(n)
y <- rnorm(n)
npunitest(x,y,bootstrap=FALSE)
Sys.sleep(5)
## Conduct the test for this data
npunitest(x,y,boot.num=99)
Sys.sleep(5)
## Conduct the test for data drawn from different distributions having
## the same mean and variance
x <- rchisq(n,df=5)
y <- rnorm(n,mean=5,sd=sqrt(10))
mean(x)
mean(y)
sd(x)
sd(y)
npunitest(x,y,boot.num=99)
Sys.sleep(5)
## Two sample t-test for equality of means
t.test(x,y)
## F test for equality of variances and asymptotic
## critical values
F <- var(x)/var(y)
qf(c(0.025,0.975),df1=n-1,df2=n-1)
## Plot the nonparametric density estimates on the same axes
fx <- density(x)
fy <- density(y)
xlim <- c(min(fx$x,fy$x),max(fx$x,fy$x))
ylim <- c(min(fx$y,fy$y),max(fx$y,fy$y))
plot(fx,xlim=xlim,ylim=ylim,xlab="Data",main="f(x), f(y)")
lines(fy$x,fy$y,col="red")
Sys.sleep(5)
## Test for equality of log(wage) distributions
data(wage1)
attach(wage1)
lwage.male <- lwage[female=="Male"]
lwage.female <- lwage[female=="Female"]
npunitest(lwage.male,lwage.female,boot.num=99)
Sys.sleep(5)
## Plot the nonparametric density estimates on the same axes
f.m <- density(lwage.male)
f.f <- density(lwage.female)
xlim <- c(min(f.m$x,f.f$x),max(f.m$x,f.f$x))
ylim <- c(min(f.m$y,f.f$y),max(f.m$y,f.f$y))
plot(f.m,xlim=xlim,ylim=ylim,
xlab="log(wage)",
main="Male/Female log(wage) Distributions")
lines(f.f$x,f.f$y,col="red",lty=2)
rug(lwage.male)
legend(-1,1.2,c("Male","Female"),lty=c(1,2),col=c("black","red"))
detach(wage1)
Sys.sleep(5)
## Conduct the test for data drawn from different discrete probability
## distributions
x <- factor(rbinom(n,2,.5))
y <- factor(rbinom(n,2,.1))
npunitest(x,y,boot.num=99)
## End(Not run)