utest_classify {uclust} | R Documentation |
Test for classification of a sample in one of two groups.
Description
The null hypothesis is that the new data is not well classified into the first group when compared to the second group. The alternative hypothesis is that the data is well classified into the first group.
Usage
utest_classify(x, data, group_id, bootstrap_iter = 1000)
Arguments
x |
A numeric vector to be classified. |
data |
Data matrix. Each row represents an observation. |
group_id |
A vector of 0s (first group) and 1s indicating to which group the samples belong. Must be in the same order as data. |
bootstrap_iter |
Numeric scalar. The number of bootstraps. It's recommended
|
Details
The test is performed considering the squared Euclidean distance.
For more detail see Cybis, Gabriela B., Marcio Valk, and SÃlvia RC Lopes. "Clustering and classification problems in genetics through U-statistics." Journal of Statistical Computation and Simulation 88.10 (2018) and Valk, Marcio, and Gabriela Bettella Cybis. "U-statistical inference for hierarchical clustering." arXiv preprint arXiv:1805.12179 (2018).
Value
A list with class "utest_classify" containing the following components:
statistic |
the value of the test statistic. |
p_value |
The p-value for the test. |
bootstrap_iter |
the number of bootstrap iterations. |
Examples
# Example 1
# Five observations from each group, G1 and G2. Each observation has 60 dimensions.
data <- matrix(c(rnorm(300, 0), rnorm(300, 10)), ncol = 60, byrow=TRUE)
# Test data comes from G1.
x <- rnorm(60, 0)
# The test correctly indicates that the test data should be classified into G1 (p < 0.05).
utest_classify(x, data, group_id = c(rep(0,times=5),rep(1,times=5)))
# Example 2
# Five observations from each group, G1 and G2. Each observation has 60 dimensions.
data <- matrix(c(rnorm(300, 0), rnorm(300, 10)), ncol = 60, byrow=TRUE)
# Test data comes from G2.
x <- rnorm(60, 10)
# The test correctly indicates that the test data should be classified into G2 (p > 0.05).
utest_classify(x, data, group_id = c(rep(1,times=5),rep(0,times=5)))