MUS {pivmet} | R Documentation |
MUS algorithm
Description
Perform Maxima Units Search (MUS) algorithm on a large and sparse matrix in order to find a set of pivotal units through a sequential search in the given matrix.
Usage
MUS(C, clusters, prec_par = 10)
Arguments
C |
|
clusters |
A vector of integers from |
prec_par |
Optional argument. The maximum number of candidate pivots for each group. Default is 10. |
Details
Consider H
distinct partitions of a set of N
d
-dimensional
statistical units into k
groups determined by some
clustering technique. A N \times N
co-association matrix
C
with generic element c_{i,j}=n_{i,j}/H
can be constructed,
where n_{i,j}
is the number of times the i
-th and the j
-th unit
are assigned to the same cluster with respect to the clustering ensemble.
Units which are very distant
from each other are likely to have zero co-occurrences; as a consequence,
C
is
a square symmetric matrix expected to contain a non-negligible number of zeros.
The main task of the MUS algorithm is to detect submatrices of small
rank from the co-association matrix
and extract those units—pivots—such
that the k \times k
submatrix of C
,
determined by only the pivotal rows
and columns indexes, is identical or nearly identical.
Practically, the resulting units
have the desirable property to be representative of
the group they belong to.
With the argument prec_par
the user may increase
the powerful of the underlying MUS algorithm (see @egidi2018mus for details).
Given the default value 10, the function internally computes an
effective prec_par
as \min( 10, \min n_j )
,
where n_j
is the number of units belonging to the group
j, \ j=1,\ldots,k
.
Value
pivots |
A vector of integers in 1:N denoting the indeces of the |
prec_par |
The effective number of alternative pivots considered for each group. See Details. |
Author(s)
Leonardo Egidi legidi@units.it, Roberta Pappadà
References
Egidi, L., Pappadà, R., Pauli, F., Torelli, N. (2018). Maxima Units Search(MUS) algorithm: methodology and applications. In: Perna, C. , Pratesi, M., Ruiz-Gazen A. (eds.) Studies in Theoretical and Applied Statistics, Springer Proceedings in Mathematics and Statistics 227, pp. 71–81.
Examples
# Data generated from a mixture of three bivariate Gaussian distributions
## Not run:
N <- 620
centers <- 3
n1 <- 20
n2 <- 100
n3 <- 500
x <- matrix(NA, N,2)
truegroup <- c( rep(1,n1), rep(2, n2), rep(3, n3))
x[1:n1,]=rmvnorm(n1, c(1,5), sigma=diag(2))
x[(n1+1):(n1+n2),]=rmvnorm(n2, c(4,0), sigma=diag(2))
x[(n1+n2+1):(n1+n2+n3),]=rmvnorm(n3, c(6,6), sigma=diag(2))
# Build a similarity matrix from clustering ensembles
H <- 1000
a <- matrix(NA, H, N)
for (h in 1:H){
a[h,] <- kmeans(x,centers)$cluster
}
sim_matr <- matrix(NA, N,N)
for (i in 1:(N-1)){
for (j in (i+1):N){
sim_matr[i,j] <- sum(a[,i]==a[,j])/H
sim_matr[j,i] <- sim_matr[i,j]
}
}
# Obtain a clustering solution via kmeans with multiple random seeds
cl <- KMeans(x, centers)$cluster
# Find three pivots
mus_alg <- MUS(C = sim_matr, clusters = cl)
## End(Not run)