R: Independence test for discrete data

USP.test {USP}

R Documentation

Independence test for discrete data

Description

Carry out a permutation independence test on a two-way contingency table. The test statistic is Tn, as described in Sections 3.1 and 7.1 of (Berrett et al. 2021). This also appears as Un in (Berrett and Samworth 2021). The critical value is found by sampling null contingency tables, with the same row and column totals as the input, via Patefield's algorithm, and recomputing the test statistic.

Usage

USP.test(freq, B = 999, ties.method = "standard", nullstats = FALSE)

Arguments

`freq`	Two-way contingency table whose independence is to be tested.
`B`	The number of resampled null tables to be used to calibrate the test.
`ties.method`	If "standard" then calculate the p-value as in (5) of (Berrett et al. 2021), which is slightly conservative. If "random" then break ties randomly. This preserves Type I error control.
`nullstats`	If TRUE, returns a vector of the null statistic values.

Value

Returns the p-value for this independence test and the value of the test statistic, T_n, as defined in (Berrett et al. 2021). The third element of the list is the table of expected counts, and the final element is the table of contributions to T_n. If nullstats=TRUE is used, then the function also returns a vector of the null statistics.

References

Berrett TB, Kontoyiannis I, Samworth RJ (2021). “Optimal rates for independence testing via U-statistic permutation tests.” Annals of Statistics, to appear.

Berrett TB, Samworth RJ (2021). “USP: an independence test that improves on Pearson’s chi-squared and the G-test.” Submitted, available at arXiv:2101.10880.

Examples

freq=r2dtable(1,rep(10,5),rep(10,5))[[1]] + 4*diag(rep(1,5))
USP.test(freq,999)

freq=diag(1:5); USP.test(freq,999)

freq=r2dtable(1,rep(10,5),rep(10,5))[[1]];
test=USP.test(freq,999,nullstats=TRUE)
plot(density(test$NullStats,from=0,
to=max(max(test$NullStats),test$TestStat)),
    xlim=c(min(test$NullStats),max(max(test$NullStats),test$TestStat)),
    main="Test Statistics")
abline(v=test$TestStat,col=2); TestStats=c(test$TestStat,test$NullStats)
abline(v=quantile(TestStats,probs=0.95),lty=2)

[Package USP version 0.1.2 Index]