R: Robust independence test for two continuous variables of...

indeptest {robusTest}

R Documentation

Robust independence test for two continuous variables of Kolmogorov-Smirnov's type

Description

Test the independence between two continuous variables based on the maximum distance between the joint empirical cumulative distribution function and the product of the marginal empirical cumulative distribution functions.

Usage

indeptest(
  x,
  y,
  N = 50000,
  simu = FALSE,
  ties.break = "none",
  nb_tiebreak = 100
)

Arguments

`x`, `y`	the two continuous variables. Must be of same length.
`N`	the number of Monte-Carlo replications if simu=TRUE.
`simu`	if TRUE a Monte-Carlo simulation with `N` replications is used to determine the distribution of the test statistic under the null hypothesis. If FALSE, pre computed tables are used (see Details for more information).
`ties.break`	the method used to break ties in case there are ties in the x or y vectors. Can be `"none"`, `"random"` or `"rep_random"`.
`nb_tiebreak`	the number of repetition for breaking the ties when `ties.break="rep_random"`.

Details

For two continuous variables, indeptest tests H0 X and Y are independent against H1 X and Y are not independent.

For observations (x1,y1), ..., (x_n,y_n), the bivariate e.c.d.f. (empirical cumulative distribution function) Fn is defined as:

Fn(t1,t2) = sum_{i=1}^n Indicator(xi<=t1,yi<=t2)/n.

Let Fn(t1) and Fn(t2) be the marginals e.c.d.f. The test statistic is defined as:

n^(1/2) sup_{t1,t2} |Fn(t1,t2)-Fn(t1)*Fn(t2)|.

Under H0 the test statistic is distribution free and is equivalent to the same test statistic computed for two independent continuous uniform variables in [0,1], where the supremum is taken for t1,t2 in [0,1]. Using this result, the distribution of the test statistic is obtained using Monte-Carlo simulations. The user can either use the argument simu=TRUE to perform the Monte-Carlo simulation (with N the number of replications) or simply use the available tables by choosing simu=FALSE. In the latter case, the exact distribution is estimated for n=1, ...,150. For 151<=n<=175, the distribution with n=150 is used. For 176<=n<=250, the distribution with n=200 is used. For 251<=n<=400, the distribution with n=300 is used. For 401<=n<=750, the distribution with n=500 is used. For n>=751, the distribution with n=1000 is used. Those tables were computed using 2e^5 replications in Monte-Carlo simulations.

Value

Returns the result of the test with its corresponding p-value and the value of the test statistic.

Note

Only a two sided alternative is possible with this test. Missing values are removed such that if a value of x (resp. y) is missing then the corresponding values of both x and y are removed. The test is then implemented on the remaining elements. If ties.break="none" the ties are ignored, putting mass (number of ties)/n at tied observations in the computation of the empirical cumulative distribution functions. If ties.break="random" they are randomly broken. If ties.break="rep_random" they are randomly broken nb_tiebreak times where nb_tiebreak is a parameter of the function. In that case, the test statistic and the p values are computed by taking the average over all replications.

This function is implemented using the Rcpp package.

Author(s)

See Distribution Free Tests of Independence Based on the Sample Distribution Function. J. R. Blum, J. Kiefer and M. Rosenblatt, 1961.

Examples

#Simulated data 1
x<-c(0.2, 0.3, 0.1, 0.4)
y<-c(0.5, 0.4, 0.05, 0.2)
indeptest(x,y)

#Simulated data 2
n<-40 #sample size
x<-rnorm(n)
y<-x^2+0.3*rnorm(n)
plot(x,y)
indeptest(x,y)

#Application on the Evans dataset
#Description of this dataset is available in the lbreg package
data(Evans)
with(Evans,plot(CHL[CDH==1],DBP[CDH==1]))
with(Evans,cor.test(CHL[CDH==1],DBP[CDH==1])) #the standard Pearson test
with(Evans,cortest(CHL[CDH==1],DBP[CDH==1])) #the robust Pearson test
with(Evans,indeptest(CHL[CDH==1],DBP[CDH==1])) #the robust independence test
#The robust tests give very different pvalues than the standard Pearson test!

#Breaking the ties
#The ties are broken once
with(Evans,indeptest(CHL[CDH==1],DBP[CDH==1],ties.break="random"))
#The ties are broken repeatedly and the average of the test statistics and p.values
#are computed
with(Evans,indeptest(CHL[CDH==1],DBP[CDH==1],ties.break="rep_random",nb_tiebreak=100))

[Package robusTest version 1.1.0 Index]