R: INCA Test

INCAtest {ICGE}

R Documentation

INCA Test

Description

Assume that n units are divided into k groups C1,...,Ck. Function INCAtest performs the typicality INCA test. Therein, the null hypothesis that a new unit x0 is a typical unit with respect to a previously fixed partition is tested versus the alternative hypothesis that the unit is atypical.

Usage

INCAtest(d, pert, d_test, np = 1000, alpha = 0.05, P = 1)

Arguments

`d`	a distance matrix or a `dist` object with distance information between units.
`pert`	an n-vector that indicates which group each unit belongs to. Note that the expected values of `pert` are numbers greater than or equal to 1 (for instance 1,2,3,4..., k). The default value indicates there is only one group in data.
`d_test`	an n-vector containing the distances from x0 to the other units.
`np`	sample size for the bootstrap sample for the bootstrap procedure.
`alpha`	fixed level for the test.
`P`	Number of times the bootstrap procedure is repeated.

Value

A list with class "incat" containing the following components:

`StatisticW0`	value of the INCA statistic.
`ProjectionsU`	values of statistics measuring the projection from the specific object to each considered group.
`pvalues`	p-values obtained in the `P` times repeated bootstrap procedure. Note: If `P`>1, it is printed the number of times the p-values were smaller than `alpha`.
`alpha`	specified value of the level of the test.

Note

To obtain the INCA statistic distribution, under the null hypothesis, the program can consume long time. For a correct geometrical interpretation it is convenient to verify whether the distance matrix d is Euclidean.

Author(s)

Itziar Irigoien itziar.irigoien@ehu.es; Konputazio Zientziak eta Adimen Artifiziala, Euskal Herriko Unibertsitatea (UPV-EHU), Donostia, Spain.

Conchita Arenas carenas@ub.edu; Departament d'Estadistica, Universitat de Barcelona, Barcelona, Spain.

References

Irigoien, I. and Arenas, C. (2008). INCA: New statistic for estimating the number of clusters and identifying atypical units. Statistics in Medicine, 27(15), 2948–2973.

Arenas, C. and Cuadras, C.M. (2002). Some recent statistical methods based on distances. Contributions to Science, 2, 183–191.

Examples

#generate 3 clusters, each of them with 20 objects in dimension 5.
mu1 <- sample(1:10, 5, replace=TRUE)
x1 <- matrix(rnorm(20*5, mean = mu1, sd = 1),ncol=5, byrow=TRUE)
mu2 <- sample(1:10, 5, replace=TRUE)
x2 <- matrix(rnorm(20*5, mean = mu2, sd = 1),ncol=5, byrow=TRUE)
mu3 <- sample(1:10, 5, replace=TRUE)
x3 <- matrix(rnorm(20*5, mean = mu3, sd = 1),ncol=5, byrow=TRUE)
x <- rbind(x1,x2,x3)

# Euclidean distance between units in matrix x.
d <- dist(x)
# given the right partition
partition <- c(rep(1,20), rep(2,20), rep(3,20))

# x0 contains a unit from one group, as for example group 1.
x0 <-  matrix(rnorm(1*5, mean = mu1, sd = 1),ncol=5, byrow=TRUE)

# distances between x0 and the other units.
dx0 <- rep(0,60)
for (i in 1:60){
	dif <-x0-x[i,]
	dx0[i] <- sqrt(sum(dif*dif))
}

INCAtest(d, partition, dx0, np=10)


# x0 contains a unit from a new group.
x0 <-  matrix(rnorm(1*5, mean = sample(1:10, 5, replace=TRUE),
        sd = 1), ncol=5, byrow=TRUE)

# distances between x0 and the other units in matrix x.
dx0 <- rep(0,60)
for (i in 1:60){
	dif <-x0-x[i,]
	dx0[i] <- sqrt(sum(dif*dif))
}

INCAtest(d, partition, dx0, np=10)

[Package ICGE version 0.4.2 Index]