FStest {HDLSSkST}R Documentation

k-Sample FS Test of Equal Distributions

Description

Performs the distribution free exact k-sample test for equality of multivariate distributions in the HDLSS regime.

Usage

FStest(M, labels, sizes, n_clust, randomization = TRUE, clust_alg = "knwClustNo", 
kmax = 2 * n_clust, s_psi = 1, s_h = 1, lb = 1, n_sts = 1000, alpha = 0.05)

Arguments

M

n\times d observations matrix of pooled sample, the observations should be grouped by their respective classes

labels

length n vector of membership index of observations

sizes

vector of sample sizes

n_clust

number of the Populations

randomization

logical; if TRUE (default), randomization test and FALSE, non-randomization test

clust_alg

"knwClustNo"(default) or "estclustNo"(for MFS test); modified K-means algorithm used for clustering

kmax

maximum value of total number of clusters to estimate total number of clusters in the whole observations, default: 2*n_clust

s_psi

function required for clustering, 1 for t^2, 2 for 1-\exp(-t), 3 for 1-\exp(-t^2), 4 for \log(1+t), 5 for t

s_h

function required for clustering, 1 for \sqrt t, 2 for t

lb

each observation is partitioned into some numbers of smaller vectors of same length lb, default: 1

n_sts

number of simulation of the test statistic, default: 1000

alpha

numeric, confidence level \alpha, default: 0.05

Value

FStest returns a list containing the following items:

estClustLabel

a vector of length n of estimated class membership index of all observations

obsCtyTab

observed contingency table

ObservedProb

value of the observed test statistic

FCutoff

cut-off of the test

randomGamma

randomized coefficient of the test

estPvalue

estimated p-value of the test

decisionF

if returns 1, reject the null hypothesis and if returns 0, fails to reject the null hypothesis

estClustNo

total number of the estimated classes

Author(s)

Biplab Paul, Shyamal K. De and Anil K. Ghosh

Maintainer: Biplab Paul<paul.biplab497@gmail.com>

References

Biplab Paul, Shyamal K De and Anil K Ghosh (2021). Some clustering based exact distribution-free k-sample tests applicable to high dimension, low sample size data, Journal of Multivariate Analysis, doi:10.1016/j.jmva.2021.104897.

Cyrus R Mehta and Nitin R Patel (1983). A network algorithm for performing Fisher's exact test in rxc contingency tables, Journal of the American Statistical Association, 78(382):427-434, doi:10.2307/2288652.

Examples

   # muiltivariate normal distribution:
   # generate data with dimension d = 500
   set.seed(151)
   n1=n2=n3=n4=10
   k = 4
   d = 500
   I1 <- matrix(rnorm(n1*d,mean=0,sd=1),n1,d)
   I2 <- matrix(rnorm(n2*d,mean=0.5,sd=1),n2,d) 
   I3 <- matrix(rnorm(n3*d,mean=1,sd=1),n3,d) 
   I4 <- matrix(rnorm(n4*d,mean=1.5,sd=1),n4,d) 
   levels <- c(rep(0,n1), rep(1,n2), rep(2,n3), rep(3,n4)) 
   X <- as.matrix(rbind(I1,I2,I3,I4)) 
   #FS test:
   results <- FStest(M=X, labels=levels, sizes = c(n1,n2,n3,n4), n_clust = k)
  
   ## outputs:
   results$estClustLabel
   #[1] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3

   results$obsCtyTab
   #      [,1] [,2] [,3] [,4]
   #[1,]   10    0    0    0
   #[2,]    0   10    0    0
   #[3,]    0    0   10    0
   #[4,]    0    0    0   10

   results$ObservedProb
   #[1] 2.125236e-22

   results$FCutoff
   #[1] 1.115958e-07

   results$randomGamma
   #[1] 0

   results$estPvalue
   #[1] 0

   results$decisionF
   #[1] 1


[Package HDLSSkST version 2.1.0 Index]