R: Pivotal Selection via Co-Association Matrix

piv_sel {pivmet}

R Documentation

Pivotal Selection via Co-Association Matrix

Description

Finding pivotal units from a data partition and a co-association matrix C according to three different methods.

Usage

piv_sel(C, clusters)

Arguments

`C`	A `N \times N` co-association matrix, i.e. a matrix whose elements are co-occurrences of pair of units in the same cluster among `H` distinct partitions.
`clusters`	A vector of integers from `1:k` indicating a partition of the `N` units into, say, `k` groups.

Details

Given a set of N observations (y_{1},y_{2},...,y_{N}) (y_i may be a d-dimensional vector, d \ge 1), consider clustering methods to obtain H distinct partitions into k groups. The matrix C is the co-association matrix, where c_{i,p}=n_{i,p}/H, with n_{i,p} the number of times the pair (y_{i},y_{p}) is assigned to the same cluster among the H partitions.

Let j be the group containing units \mathcal J_j, the user may choose {i^*}\in\mathcal J_j that maximizes one of the quantities:

\sum_{p\in\mathcal J_j} c_{{i^*}p}

\sum_{p\in\mathcal J_j} c_{{i^*}p} - \sum_{j\not\in\mathcal J_j} c_{{i^*}p}.

These methods give the unit that maximizes the global within similarity ("maxsumint") and the unit that maximizes the difference between global within and between similarities ("maxsumdiff"), respectively. Alternatively, we may choose i^{*} \in\mathcal J_j, which minimizes:

\sum_{p\not\in\mathcal J_j} c_{i^{*}p},

obtaining the most distant unit among the members that minimize the global dissimilarity between one group and all the others ("minsumnoint"). See the vignette for further details.

Value

pivots

A matrix with k rows and three columns containing the indexes of the pivotal units for each method.

Author(s)

Leonardo Egidi legidi@units.it

References

Egidi, L., Pappadà, R., Pauli, F. and Torelli, N. (2018). Relabelling in Bayesian Mixture Models by Pivotal Units. Statistics and Computing, 28(4), 957-969.

Examples

# Iris data

data(iris)
# select the columns of variables
x<- iris[,1:4]
N <- nrow(x)
H <- 1000
a <- matrix(NA, H, N)

# Perform H k-means partitions

for (h in 1:H){
 a[h,] <- kmeans(x, centers = 3)$cluster
}
# Build the co-association matrix

C <- matrix(NA, N,N)
for (i in 1:(N-1)){
 for (j in (i+1):N){
   C[i,j] <- sum(a[,i]==a[,j])/H
   C[j,i] <- C[i,j]
 }}

km <- kmeans(x, centers =3)

# Apply three pivotal criteria to the co-association matrix

ris <- piv_sel(C, clusters = km$cluster)

graphics::plot(iris[,1], iris[,2], xlab ="Sepal.Length", ylab= "Sepal.Width",
col = km$cluster)

 # Add the pivots chosen by the maxsumdiff criterion

points( x[ris$pivots[,3], 1:2], col = 1:3,
cex =2, pch = 8 )

[Package pivmet version 0.6.0 Index]