is_homo3 {uclust} | R Documentation |
U-statistic based homogeneity test for 3 groups
Description
Homogeneity test based on the statistic bn3. The test assesses whether there exists a data partition for which three group separation is statistically significant according to utest3. The null hypothesis is overall sample homogeneity, and a sample is considered homogeneous if it cannot be divided into three groups with at least one significantly different from the others.
Usage
is_homo3(md = NULL, data = NULL, rep = 20, test_max = TRUE, alpha = 0.05)
Arguments
md |
Matrix of distances between all data points. |
data |
Data matrix. Each row represents an observation. |
rep |
Number of times to repeat optimization procedure. Important for problems with multiple optima. |
test_max |
Logical indicating whether to employ the max test |
alpha |
Significance level |
Details
This is the homogeneity test of Bello et al. (2021). The test is performed through two steps: an optimization procedure that finds the data partition that maximizes the standardized Bn and a test for the resulting maximal partition. Should be used in high dimension small sample size settings.
Either data
or md
should be provided.
If data are entered directly, Bn will be computed considering the squared Euclidean distance.
Variance of bn
is estimated through resampling, and thus, p-values may vary a bit in different runs.
For more detail see Bello, Debora Zava, Marcio Valk and Gabriela Bettella Cybis. "Clustering inference in multiple groups." arXiv preprint arXiv:2106.09115 (2021).
Value
Returns a list with the following elements:
- stdBn
Test statistic. Maximum standardized Bn.
- group1
Elements in group 1 in the maximal partition. (obs: this is not the best partition for the data, see
uclust3
)- group2
Elements in group 2 in the maximal partition.
- group3
Elements in group 3 in the maximal partition.
- pvalue.Bonferroni
P-value for the homogeneity test.
- alpha_Bonferroni
Alpha after Bonferroni correction
- bootB
Resampling variance estimate for partitions with central group sizes.
- bootB1
Resampling variance estimate for partitions with one group of size 1.
- varBn
Estimated variance of Bn for maximal standardized Bn configuration.
Examples
set.seed(123)
x = matrix(rnorm(70000),nrow=7) #creating homogeneous Gaussian dataset
res = is_homo3(data=x)
res
#uncomment to run
# x = matrix(rnorm(18000),nrow=18)
# x[1:5,] = x[1:5,]+0.5 #Heterogeneous dataset (first 5 samples have different mean)
# x[6:9,] = x[6:9,]+1.5
# res = is_homo3(data=x)
# res
# md = as.matrix(dist(x)^2) #squared Euclidean distances for the same data
# res = is_homo3(md) # uncomment to run
# Multidimensional sacling plot of distance matrix
#fit <- cmdscale(md, eig = TRUE, k = 2)
#x <- fit$points[, 1]
#y <- fit$points[, 2]
#plot(x,y, main=paste("Homogeneity test: p-value =",res$p.MaxTest))