utest_classify {uclust}R Documentation

Test for classification of a sample in one of two groups.

Description

The null hypothesis is that the new data is not well classified into the first group when compared to the second group. The alternative hypothesis is that the data is well classified into the first group.

Usage

utest_classify(x, data, group_id, bootstrap_iter = 1000)

Arguments

x

A numeric vector to be classified.

data

Data matrix. Each row represents an observation.

group_id

A vector of 0s (first group) and 1s indicating to which group the samples belong. Must be in the same order as data.

bootstrap_iter

Numeric scalar. The number of bootstraps. It's recommended 1000 < bootstrap_iter < 10000.

Details

The test is performed considering the squared Euclidean distance.

For more detail see Cybis, Gabriela B., Marcio Valk, and Sílvia RC Lopes. "Clustering and classification problems in genetics through U-statistics." Journal of Statistical Computation and Simulation 88.10 (2018) and Valk, Marcio, and Gabriela Bettella Cybis. "U-statistical inference for hierarchical clustering." arXiv preprint arXiv:1805.12179 (2018).

Value

A list with class "utest_classify" containing the following components:

statistic

the value of the test statistic.

p_value

The p-value for the test.

bootstrap_iter

the number of bootstrap iterations.

Examples

# Example 1
# Five observations from each group, G1 and G2. Each observation has 60 dimensions.
data <- matrix(c(rnorm(300, 0), rnorm(300, 10)), ncol = 60, byrow=TRUE)
# Test data comes from G1.
x <- rnorm(60, 0)
# The test correctly indicates that the test data should be classified into G1 (p < 0.05).
utest_classify(x, data, group_id = c(rep(0,times=5),rep(1,times=5)))

# Example 2
# Five observations from each group, G1 and G2. Each observation has 60 dimensions.
data <- matrix(c(rnorm(300, 0), rnorm(300, 10)), ncol = 60, byrow=TRUE)
# Test data comes from G2.
x <- rnorm(60, 10)
# The test correctly indicates that the test data should be classified into G2 (p > 0.05).
utest_classify(x, data, group_id = c(rep(1,times=5),rep(0,times=5)))

[Package uclust version 1.0.0 Index]