is_homo3 {uclust}R Documentation

U-statistic based homogeneity test for 3 groups

Description

Homogeneity test based on the statistic bn3. The test assesses whether there exists a data partition for which three group separation is statistically significant according to utest3. The null hypothesis is overall sample homogeneity, and a sample is considered homogeneous if it cannot be divided into three groups with at least one significantly different from the others.

Usage

is_homo3(md = NULL, data = NULL, rep = 20, test_max = TRUE, alpha = 0.05)

Arguments

md

Matrix of distances between all data points.

data

Data matrix. Each row represents an observation.

rep

Number of times to repeat optimization procedure. Important for problems with multiple optima.

test_max

Logical indicating whether to employ the max test

alpha

Significance level

Details

This is the homogeneity test of Bello et al. (2021). The test is performed through two steps: an optimization procedure that finds the data partition that maximizes the standardized Bn and a test for the resulting maximal partition. Should be used in high dimension small sample size settings.

Either data or md should be provided. If data are entered directly, Bn will be computed considering the squared Euclidean distance.

Variance of bn is estimated through resampling, and thus, p-values may vary a bit in different runs.

For more detail see Bello, Debora Zava, Marcio Valk and Gabriela Bettella Cybis. "Clustering inference in multiple groups." arXiv preprint arXiv:2106.09115 (2021).

Value

Returns a list with the following elements:

stdBn

Test statistic. Maximum standardized Bn.

group1

Elements in group 1 in the maximal partition. (obs: this is not the best partition for the data, see uclust3)

group2

Elements in group 2 in the maximal partition.

group3

Elements in group 3 in the maximal partition.

pvalue.Bonferroni

P-value for the homogeneity test.

alpha_Bonferroni

Alpha after Bonferroni correction

bootB

Resampling variance estimate for partitions with central group sizes.

bootB1

Resampling variance estimate for partitions with one group of size 1.

varBn

Estimated variance of Bn for maximal standardized Bn configuration.

Examples

set.seed(123)
x = matrix(rnorm(70000),nrow=7)  #creating homogeneous Gaussian dataset
res = is_homo3(data=x)
res

#uncomment to run
# x = matrix(rnorm(18000),nrow=18)
# x[1:5,] = x[1:5,]+0.5 #Heterogeneous dataset (first 5 samples have different mean)
# x[6:9,] = x[6:9,]+1.5
# res = is_homo3(data=x)
#  res
# md = as.matrix(dist(x)^2) #squared Euclidean distances for the same data
# res = is_homo3(md)       # uncomment to run

# Multidimensional sacling plot of distance matrix
#fit <- cmdscale(md, eig = TRUE, k = 2)
#x <- fit$points[, 1]
#y <- fit$points[, 2]
#plot(x,y, main=paste("Homogeneity test: p-value =",res$p.MaxTest))


[Package uclust version 1.0.0 Index]