R: MUS algorithm

MUS {pivmet}

R Documentation

MUS algorithm

Description

Perform Maxima Units Search (MUS) algorithm on a large and sparse matrix in order to find a set of pivotal units through a sequential search in the given matrix.

Usage

MUS(C, clusters, prec_par = 10)

Arguments

`C`	`N \times N` matrix with a non-negligible number of zeros. For instance, a similarity matrix estimated from a `N \times D` data matrix whose rows are statistical units, or a co-association matrix resulting from clustering ensembles.
`clusters`	A vector of integers from `1:k` indicating the cluster to which each point is allocated (it requires `k < 5`, see Details).
`prec_par`	Optional argument. The maximum number of candidate pivots for each group. Default is 10.

Details

Consider H distinct partitions of a set of N d-dimensional statistical units into k groups determined by some clustering technique. A N \times N co-association matrix C with generic element c_{i,j}=n_{i,j}/H can be constructed, where n_{i,j} is the number of times the i-th and the j-th unit are assigned to the same cluster with respect to the clustering ensemble. Units which are very distant from each other are likely to have zero co-occurrences; as a consequence, C is a square symmetric matrix expected to contain a non-negligible number of zeros. The main task of the MUS algorithm is to detect submatrices of small rank from the co-association matrix and extract those units—pivots—such that the k \times k submatrix of C, determined by only the pivotal rows and columns indexes, is identical or nearly identical. Practically, the resulting units have the desirable property to be representative of the group they belong to.

With the argument prec_par the user may increase the powerful of the underlying MUS algorithm (see @egidi2018mus for details). Given the default value 10, the function internally computes an effective prec_par as \min( 10, \min n_j ), where n_j is the number of units belonging to the group j, \ j=1,\ldots,k.

Value

`pivots`	A vector of integers in 1:N denoting the indeces of the `k` selcted pivotal units.
`prec_par`	The effective number of alternative pivots considered for each group. See Details.

Author(s)

Leonardo Egidi legidi@units.it, Roberta Pappadà

References

Egidi, L., Pappadà, R., Pauli, F., Torelli, N. (2018). Maxima Units Search(MUS) algorithm: methodology and applications. In: Perna, C. , Pratesi, M., Ruiz-Gazen A. (eds.) Studies in Theoretical and Applied Statistics, Springer Proceedings in Mathematics and Statistics 227, pp. 71–81.

Examples


# Data generated from a mixture of three bivariate Gaussian distributions

## Not run: 
N <- 620
centers  <- 3
n1 <- 20
n2 <- 100
n3 <- 500
x  <- matrix(NA, N,2)
truegroup <- c( rep(1,n1), rep(2, n2), rep(3, n3))


 x[1:n1,]=rmvnorm(n1, c(1,5), sigma=diag(2))
 x[(n1+1):(n1+n2),]=rmvnorm(n2, c(4,0), sigma=diag(2))
 x[(n1+n2+1):(n1+n2+n3),]=rmvnorm(n3, c(6,6), sigma=diag(2))

# Build a similarity matrix from clustering ensembles

H <- 1000
a <- matrix(NA, H, N)

for (h in 1:H){
   a[h,] <- kmeans(x,centers)$cluster
}

sim_matr <- matrix(NA, N,N)
for (i in 1:(N-1)){
  for (j in (i+1):N){
     sim_matr[i,j] <- sum(a[,i]==a[,j])/H
     sim_matr[j,i] <- sim_matr[i,j]
     }
}

# Obtain a clustering solution via kmeans with multiple random seeds

cl <- KMeans(x, centers)$cluster

# Find three pivots

mus_alg <- MUS(C = sim_matr, clusters = cl)

## End(Not run)

[Package pivmet version 0.6.0 Index]