R: k-Sample FS Test of Equal Distributions

FStest {HDLSSkST}

R Documentation

k-Sample FS Test of Equal Distributions

Description

Performs the distribution free exact k-sample test for equality of multivariate distributions in the HDLSS regime.

Usage

FStest(M, labels, sizes, n_clust, randomization = TRUE, clust_alg = "knwClustNo", 
kmax = 2 * n_clust, s_psi = 1, s_h = 1, lb = 1, n_sts = 1000, alpha = 0.05)

Arguments

`M`	`n\times d` observations matrix of pooled sample, the observations should be grouped by their respective classes
`labels`	length `n` vector of membership index of observations
`sizes`	vector of sample sizes
`n_clust`	number of the Populations
`randomization`	logical; if TRUE (default), randomization test and FALSE, non-randomization test
`clust_alg`	`"knwClustNo"`(default) or `"estclustNo"`(for MFS test); modified K-means algorithm used for clustering
`kmax`	maximum value of total number of clusters to estimate total number of clusters in the whole observations, default: `2*n_clust`
`s_psi`	function required for clustering, 1 for `t^2`, 2 for `1-\exp(-t)`, 3 for `1-\exp(-t^2)`, 4 for `\log(1+t)`, 5 for `t`
`s_h`	function required for clustering, 1 for `\sqrt t`, 2 for `t`
`lb`	each observation is partitioned into some numbers of smaller vectors of same length `lb`, default: `1`
`n_sts`	number of simulation of the test statistic, default: `1000`
`alpha`	numeric, confidence level `\alpha`, default: `0.05`

Value

FStest returns a list containing the following items:

`estClustLabel`	a vector of length `n` of estimated class membership index of all observations
`obsCtyTab`	observed contingency table
`ObservedProb`	value of the observed test statistic
`FCutoff`	cut-off of the test
`randomGamma`	randomized coefficient of the test
`estPvalue`	estimated p-value of the test
`decisionF`	if returns `1`, reject the null hypothesis and if returns `0`, fails to reject the null hypothesis
`estClustNo`	total number of the estimated classes

Author(s)

Biplab Paul, Shyamal K. De and Anil K. Ghosh

Maintainer: Biplab Paul<paul.biplab497@gmail.com>

References

Biplab Paul, Shyamal K De and Anil K Ghosh (2021). Some clustering based exact distribution-free k-sample tests applicable to high dimension, low sample size data, Journal of Multivariate Analysis, doi:10.1016/j.jmva.2021.104897.

Cyrus R Mehta and Nitin R Patel (1983). A network algorithm for performing Fisher's exact test in rxc contingency tables, Journal of the American Statistical Association, 78(382):427-434, doi:10.2307/2288652.

Examples

   # muiltivariate normal distribution:
   # generate data with dimension d = 500
   set.seed(151)
   n1=n2=n3=n4=10
   k = 4
   d = 500
   I1 <- matrix(rnorm(n1*d,mean=0,sd=1),n1,d)
   I2 <- matrix(rnorm(n2*d,mean=0.5,sd=1),n2,d) 
   I3 <- matrix(rnorm(n3*d,mean=1,sd=1),n3,d) 
   I4 <- matrix(rnorm(n4*d,mean=1.5,sd=1),n4,d) 
   levels <- c(rep(0,n1), rep(1,n2), rep(2,n3), rep(3,n4)) 
   X <- as.matrix(rbind(I1,I2,I3,I4)) 
   #FS test:
   results <- FStest(M=X, labels=levels, sizes = c(n1,n2,n3,n4), n_clust = k)
  
   ## outputs:
   results$estClustLabel
   #[1] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3

   results$obsCtyTab
   #      [,1] [,2] [,3] [,4]
   #[1,]   10    0    0    0
   #[2,]    0   10    0    0
   #[3,]    0    0   10    0
   #[4,]    0    0    0   10

   results$ObservedProb
   #[1] 2.125236e-22

   results$FCutoff
   #[1] 1.115958e-07

   results$randomGamma
   #[1] 0

   results$estPvalue
   #[1] 0

   results$decisionF
   #[1] 1

[Package HDLSSkST version 2.1.0 Index]