RItest {HDLSSkST}R Documentation

k-Sample RI Test of Equal Distributions

Description

Performs the distribution free exact k-sample test for equality of multivariate distributions in the HDLSS regime.

Usage

RItest(M, labels, sizes, n_clust, randomization = TRUE, clust_alg = "knwClustNo", 
kmax = 2 * n_clust, s_psi = 1, s_h = 1, lb = 1, n_sts = 1000, alpha = 0.05)

Arguments

M

n\times d observations matrix of pooled sample, the observations should be grouped by their respective classes

labels

length n vector of membership index of observations

sizes

vector of sample sizes

n_clust

number of the Populations

randomization

logical; if TRUE (default), randomization test and FALSE, non-randomization test

clust_alg

"knwClustNo"(default) or "estclustNo"(for MRI test); modified K-means algorithm used for clustering

kmax

maximum value of total number of clusters to estimate total number of clusters in the whole observations, default: 2*n_clust

s_psi

function required for clustering, 1 for t^2, 2 for 1-\exp(-t), 3 for 1-\exp(-t^2), 4 for \log(1+t), 5 for t

s_h

function required for clustering, 1 for \sqrt t, 2 for t

lb

each observation is partitioned into some numbers of smaller vectors of same length lb, default: 1

n_sts

number of simulation of the test statistic, default: 1000

alpha

numeric, confidence level \alpha, default: 0.05

Value

RItest returns a list containing the following items:

estClustLabel

a vector of length n of estimated class membership index of all observations

obsCtyTab

observed contingency table

ObservedRI

value of the observed test statistic

RICutoff

cut-off of the test

randomGamma

randomized coefficient of the test

estPvalue

estimated p-value of the test

decisionRI

if returns 1, reject the null hypothesis and if returns 0, fails to reject the null hypothesis

estClustNo

total number of the estimated classes

Author(s)

Biplab Paul, Shyamal K. De and Anil K. Ghosh

Maintainer: Biplab Paul<paul.biplab497@gmail.com>

References

Biplab Paul, Shyamal K De and Anil K Ghosh (2021). Some clustering based exact distribution-free k-sample tests applicable to high dimension, low sample size data, Journal of Multivariate Analysis, doi:10.1016/j.jmva.2021.104897.

William M Rand (1971). Objective criteria for the evaluation of clustering methods, Journal of the American Statistical association, 66(336):846-850, doi:10.1080/01621459.1971.10482356.

Examples

  # muiltivariate normal distribution:
  # generate data with dimension d = 500
  set.seed(151)
  n1=n2=n3=n4=10
  k = 4
  d = 500
  I1 <- matrix(rnorm(n1*d,mean=0,sd=1),n1,d)
  I2 <- matrix(rnorm(n2*d,mean=0.5,sd=1),n2,d) 
  I3 <- matrix(rnorm(n3*d,mean=1,sd=1),n3,d) 
  I4 <- matrix(rnorm(n4*d,mean=1.5,sd=1),n4,d) 
  levels <- c(rep(0,n1), rep(1,n2), rep(2,n3), rep(3,n4)) 
  X <- as.matrix(rbind(I1,I2,I3,I4)) 
  # RI test:
  results <- RItest(M=X, labels=levels, sizes = c(n1,n2,n3,n4), n_clust = k)
  
   ## outputs:
   results$estClustLabel
   #[1] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3

   results$obsCtyTab
   #      [,1] [,2] [,3] [,4]
   #[1,]   10    0    0    0
   #[2,]    0   10    0    0
   #[3,]    0    0   10    0
   #[4,]    0    0    0   10

   results$ObservedRI
   #[1] 0

   results$RICutoff
   #[1] 0.3307692

   results$randomGamma
   #[1] 0

   results$estPvalue
   #[1] 0

   results$decisionRI
   #[1] 1


[Package HDLSSkST version 2.1.0 Index]