piv_sel {pivmet}R Documentation

Pivotal Selection via Co-Association Matrix

Description

Finding pivotal units from a data partition and a co-association matrix C according to three different methods.

Usage

piv_sel(C, clusters)

Arguments

C

A N×NN \times N co-association matrix, i.e. a matrix whose elements are co-occurrences of pair of units in the same cluster among HH distinct partitions.

clusters

A vector of integers from 1:k1:k indicating a partition of the NN units into, say, kk groups.

Details

Given a set of NN observations (y1,y2,...,yN)(y_{1},y_{2},...,y_{N}) (yiy_i may be a dd-dimensional vector, d1d \ge 1), consider clustering methods to obtain HH distinct partitions into kk groups. The matrix C is the co-association matrix, where ci,p=ni,p/Hc_{i,p}=n_{i,p}/H, with ni,pn_{i,p} the number of times the pair (yi,yp)(y_{i},y_{p}) is assigned to the same cluster among the HH partitions.

Let jj be the group containing units Jj\mathcal J_j, the user may choose iJj{i^*}\in\mathcal J_j that maximizes one of the quantities:

pJjcip \sum_{p\in\mathcal J_j} c_{{i^*}p}

or

pJjcipj∉Jjcip.\sum_{p\in\mathcal J_j} c_{{i^*}p} - \sum_{j\not\in\mathcal J_j} c_{{i^*}p}.

These methods give the unit that maximizes the global within similarity ("maxsumint") and the unit that maximizes the difference between global within and between similarities ("maxsumdiff"), respectively. Alternatively, we may choose iJji^{*} \in\mathcal J_j, which minimizes:

p∉Jjcip,\sum_{p\not\in\mathcal J_j} c_{i^{*}p},

obtaining the most distant unit among the members that minimize the global dissimilarity between one group and all the others ("minsumnoint"). See the vignette for further details.

Value

pivots

A matrix with kk rows and three columns containing the indexes of the pivotal units for each method.

Author(s)

Leonardo Egidi legidi@units.it

References

Egidi, L., Pappadà, R., Pauli, F. and Torelli, N. (2018). Relabelling in Bayesian Mixture Models by Pivotal Units. Statistics and Computing, 28(4), 957-969.

Examples

# Iris data

data(iris)
# select the columns of variables
x<- iris[,1:4]
N <- nrow(x)
H <- 1000
a <- matrix(NA, H, N)

# Perform H k-means partitions

for (h in 1:H){
 a[h,] <- kmeans(x, centers = 3)$cluster
}
# Build the co-association matrix

C <- matrix(NA, N,N)
for (i in 1:(N-1)){
 for (j in (i+1):N){
   C[i,j] <- sum(a[,i]==a[,j])/H
   C[j,i] <- C[i,j]
 }}

km <- kmeans(x, centers =3)

# Apply three pivotal criteria to the co-association matrix

ris <- piv_sel(C, clusters = km$cluster)

graphics::plot(iris[,1], iris[,2], xlab ="Sepal.Length", ylab= "Sepal.Width",
col = km$cluster)

 # Add the pivots chosen by the maxsumdiff criterion

points( x[ris$pivots[,3], 1:2], col = 1:3,
cex =2, pch = 8 )


[Package pivmet version 0.6.0 Index]