R: k-Sample ARI Test of Equal Distributions

ARItest {HDLSSkST}

R Documentation

k-Sample ARI Test of Equal Distributions

Description

Performs the distribution free exact k-sample test for equality of multivariate distributions in the HDLSS regime. This an aggregate test of the two sample versions of the RI test over \frac{k(k-1)}{2} numbers of two-sample comparisons, and the test statistic is the minimum of these two sample RI test statistics. Holm's step-down-procedure (1979) and Benjamini-Hochberg procedure (1995) are applied for multiple testing.

Usage

ARItest(M, sizes, randomization = TRUE, clust_alg = "knwClustNo", kmax = 4, 
multTest = "Holm", s_psi = 1, s_h = 1, lb = 1, n_sts = 1000, alpha = 0.05)

Arguments

`M`	`n\times d` observations matrix of pooled sample, the observations should be grouped by their respective classes
`sizes`	vector of sample sizes
`randomization`	logical; if TRUE (default), randomization test and FALSE, non-randomization test
`clust_alg`	`"knwClustNo"`(default) or `"estclustNo"`; modified K-means algorithm used for clustering
`kmax`	maximum value of total number of clusters to estimate total number of clusters for two-sample comparition, default: `4`
`multTest`	`"HOlm"`(default) or `"BenHoch"`; different multiple tests
`s_psi`	function required for clustering, 1 for `t^2`, 2 for `1-\exp(-t)`, 3 for `1-\exp(-t^2)`, 4 for `\log(1+t)`, 5 for `t`
`s_h`	function required for clustering, 1 for `\sqrt t`, 2 for `t`
`lb`	each observation is partitioned into some numbers of smaller vectors of same length `lb`, default: `1`
`n_sts`	number of simulation of the test statistic, default: `1000`
`alpha`	numeric, confidence level `\alpha`, default: `0.05`

Value

ARItest returns a list containing the following items:

`ARIStat`	value of the observed test statistic
`Cutoff`	cut-off of the test
`randomGamma`	randomized coefficient of the test
`decisionARI`	if returns `1`, reject the null hypothesis and if returns `0`, fails to reject the null hypothesis
`multipleTest`	indicates where two populations are different according to multiple tests

Author(s)

Biplab Paul, Shyamal K. De and Anil K. Ghosh

Maintainer: Biplab Paul<paul.biplab497@gmail.com>

References

Biplab Paul, Shyamal K De and Anil K Ghosh (2021). Some clustering based exact distribution-free k-sample tests applicable to high dimension, low sample size data, Journal of Multivariate Analysis, doi:10.1016/j.jmva.2021.104897.

William M Rand (1971). Objective criteria for the evaluation of clustering methods, Journal of the American Statistical association, 66(336):846-850, doi:10.1080/01621459.1971.10482356.

Sture Holm (1979). A simple sequentially rejective multiple test procedure, Scandinavian journal of statistics, 65-70, doi:10.2307/4615733.

Yoav Benjamini and Yosef Hochberg (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing, Journal of the Royal statistical society: series B (Methodological) 57.1: 289-300, doi: 10.2307/2346101.

Examples

  # muiltivariate normal distribution:
  # generate data with dimension d = 500
  set.seed(151)
  n1=n2=n3=n4=10
  d = 500
  I1 <- matrix(rnorm(n1*d,mean=0,sd=1),n1,d)
  I2 <- matrix(rnorm(n2*d,mean=0.5,sd=1),n2,d) 
  I3 <- matrix(rnorm(n3*d,mean=1,sd=1),n3,d) 
  I4 <- matrix(rnorm(n4*d,mean=1.5,sd=1),n4,d) 
  X <- as.matrix(rbind(I1,I2,I3,I4)) 
  #ARI test:
  results <- ARItest(M=X, sizes = c(n1,n2,n3,n4))
  
   ## outputs:
   results$ARIStat
   #[1] 0

   results$ARICutoff
   #[1] 0.3368421

   results$randomGamma
   #[1] 0

   results$decisionARI
   #[1] 1

   results$multipleTest
   #  Population.1 Population.2 rejected pvalues
   #1            1            2     TRUE       0
   #2            1            3     TRUE       0
   #3            1            4     TRUE       0
   #4            2            3     TRUE       0
   #5            2            4     TRUE       0
   #6            3            4     TRUE       0

[Package HDLSSkST version 2.1.0 Index]