INCAtest {ICGE} | R Documentation |
INCA Test
Description
Assume that n units are divided into k groups C1,...,Ck. Function INCAtest
performs the typicality INCA test. Therein, the null hypothesis that a new unit x0 is a typical unit with respect to a previously fixed partition is tested versus the alternative hypothesis that the unit is atypical.
Usage
INCAtest(d, pert, d_test, np = 1000, alpha = 0.05, P = 1)
Arguments
d |
a distance matrix or a |
pert |
an n-vector that indicates which group each unit belongs to. Note that the expected values of |
d_test |
an n-vector containing the distances from x0 to the other units. |
np |
sample size for the bootstrap sample for the bootstrap procedure. |
alpha |
fixed level for the test. |
P |
Number of times the bootstrap procedure is repeated. |
Value
A list with class "incat" containing the following components:
StatisticW0 |
value of the INCA statistic. |
ProjectionsU |
values of statistics measuring the projection from the specific object to each considered group. |
pvalues |
p-values obtained in the |
alpha |
specified value of the level of the test. |
Note
To obtain the INCA statistic distribution, under the null hypothesis, the program can consume long time. For a correct geometrical interpretation it is convenient to verify whether the distance matrix d is Euclidean.
Author(s)
Itziar Irigoien itziar.irigoien@ehu.es; Konputazio Zientziak eta Adimen Artifiziala, Euskal Herriko Unibertsitatea (UPV-EHU), Donostia, Spain.
Conchita Arenas carenas@ub.edu; Departament d'Estadistica, Universitat de Barcelona, Barcelona, Spain.
References
Irigoien, I. and Arenas, C. (2008). INCA: New statistic for estimating the number of clusters and identifying atypical units. Statistics in Medicine, 27(15), 2948–2973.
Arenas, C. and Cuadras, C.M. (2002). Some recent statistical methods based on distances. Contributions to Science, 2, 183–191.
See Also
Examples
#generate 3 clusters, each of them with 20 objects in dimension 5.
mu1 <- sample(1:10, 5, replace=TRUE)
x1 <- matrix(rnorm(20*5, mean = mu1, sd = 1),ncol=5, byrow=TRUE)
mu2 <- sample(1:10, 5, replace=TRUE)
x2 <- matrix(rnorm(20*5, mean = mu2, sd = 1),ncol=5, byrow=TRUE)
mu3 <- sample(1:10, 5, replace=TRUE)
x3 <- matrix(rnorm(20*5, mean = mu3, sd = 1),ncol=5, byrow=TRUE)
x <- rbind(x1,x2,x3)
# Euclidean distance between units in matrix x.
d <- dist(x)
# given the right partition
partition <- c(rep(1,20), rep(2,20), rep(3,20))
# x0 contains a unit from one group, as for example group 1.
x0 <- matrix(rnorm(1*5, mean = mu1, sd = 1),ncol=5, byrow=TRUE)
# distances between x0 and the other units.
dx0 <- rep(0,60)
for (i in 1:60){
dif <-x0-x[i,]
dx0[i] <- sqrt(sum(dif*dif))
}
INCAtest(d, partition, dx0, np=10)
# x0 contains a unit from a new group.
x0 <- matrix(rnorm(1*5, mean = sample(1:10, 5, replace=TRUE),
sd = 1), ncol=5, byrow=TRUE)
# distances between x0 and the other units in matrix x.
dx0 <- rep(0,60)
for (i in 1:60){
dif <-x0-x[i,]
dx0[i] <- sqrt(sum(dif*dif))
}
INCAtest(d, partition, dx0, np=10)