R: k-EVCLUS algorithm

kevclus {evclust}

R Documentation

k-EVCLUS algorithm

Description

kevclus computes a credal partition from a dissimilarity matrix using the k-EVCLUS algorithm.

Usage

kevclus(
  x,
  k = n - 1,
  D,
  J,
  c,
  type = "simple",
  pairs = NULL,
  m0 = NULL,
  ntrials = 1,
  disp = TRUE,
  maxit = 1000,
  epsi = 1e-05,
  d0 = quantile(D, 0.9),
  tr = FALSE,
  change.order = FALSE,
  norm = 1
)

Arguments

`x`	nxp matrix of p attributes observed for n objects (optional).
`k`	Number of distances to compute for each object (default: n-1).
`D`	nxn or nxk dissimilarity matrix (used only of x is not supplied).
`J`	nxk matrix of indices. D[i,j] is the distance between objects i and J[i,j]. (Used only if D is supplied and ncol(D)<n; then k is set to ncol(D).)
`c`	Number of clusters
`type`	Type of focal sets ("simple": empty set, singletons and Omega; "full": all `2^c` subsets of `\Omega`; "pairs": `\emptyset`, singletons, `\Omega`, and all or selected pairs).
`pairs`	Set of pairs to be included in the focal sets; if NULL, all pairs are included. Used only if type="pairs".
`m0`	Initial credal partition. Should be a matrix with n rows and a number of columns equal to the number f of focal sets specified by 'type' and 'pairs'.
`ntrials`	Number of runs of the optimization algorithm (set to 1 if m0 is supplied and change.order=FALSE).
`disp`	If TRUE (default), intermediate results are displayed.
`maxit`	Maximum number of iterations.
`epsi`	Minimum amount of improvement.
`d0`	Parameter used for matrix normalization. The normalized distance corresponding to d0 is 0.95.
`tr`	If TRUE, a trace of the stress function is returned.
`change.order`	If TRUE, the order of objects is changed at each iteration of the Iterative Row-wise Quadratic Programming (IRQP) algorithm.
`norm`	Normalization of distances. 1: division by mean(D^2) (default); 2: division par n*p.

Details

This version of the EVCLUS algorithm uses the Iterative Row-wise Quadratic Programming (IRQP) algorithm (see ter Braak et al., 2009). It also makes it possible to use only a random sample of the dissimilarities, reducing the time and space complexity from quadratic to roughly linear (Denoeux et al., 2016). The user must supply: 1) a matrix x or size (n,p) containing the values of p attributes for n objects, or 2) a matrix D of size (n,n) of dissimilarities between n objects, or 3) a matrix D of size (n,k) of dissimilarities between the n objects and k randomly selected objects, AND a matrix J of size (n,k) of indices, such that D[i,j] is the distance between objects i and J[i,j]. In cases 1 and 2, the user may supply the number $k$ of distances to be picked randomly for each object. In case 3, k is set to the number of columns of D.

Value

The credal partition (an object of class "credpart"). In addition to the usual attributes, the output credal partition has the following attributes:

Kmat: The matrix of degrees of conflict. Same size as D.
D: The normalized dissimilarity matrix.
trace: Trace of the algorithm (Stress function vs iterations).
J: The matrix of indices.

Author(s)

Thierry Denoeux.

References

T. Denoeux and M.-H. Masson. EVCLUS: Evidential Clustering of Proximity Data. IEEE Transactions on Systems, Man and Cybernetics B, Vol. 34, Issue 1, 95–109, 2004.

T. Denoeux, S. Sriboonchitta and O. Kanjanatarakul. Evidential clustering of large dissimilarity data. Knowledge-Based Systems, vol. 106, pages 179-195, 2016.

C. J. ter Braak, Y. Kourmpetis, H. A. Kiers, and M. C. Bink. Approximating a similarity matrix by a latent class model: A reappraisal of additive fuzzy clustering. Computational Statistics & Data Analysis, 53(8):3183–3193, 2009.

Examples

## Example with a non metric dissimilarity matrix: the Protein dataset
## Not run: 
data(protein)
clus <- kevclus(D=protein$D,c=4,type='simple',d0=max(protein$D))
z<- cmdscale(protein$D,k=2)  # Computation of 2 attributes by Multidimensional Scaling
plot(clus,X=z,mfrow=c(2,2),ytrue=protein$y,Outliers=FALSE,Approx=1)
## Example with k=30
clus <- kevclus(D=protein$D,k=30,c=4,type='simple',d0=max(protein$D))
z<- cmdscale(protein$D,k=2)  # Computation of 2 attributes by Multidimensional Scaling
plot(clus,X=z,mfrow=c(2,2),ytrue=protein$y,Outliers=FALSE,Approx=1)

## End(Not run)

[Package evclust version 2.0.3 Index]