kevclus {evclust}R Documentation

k-EVCLUS algorithm

Description

kevclus computes a credal partition from a dissimilarity matrix using the k-EVCLUS algorithm.

Usage

kevclus(
  x,
  k = n - 1,
  D,
  J,
  c,
  type = "simple",
  pairs = NULL,
  m0 = NULL,
  ntrials = 1,
  disp = TRUE,
  maxit = 1000,
  epsi = 1e-05,
  d0 = quantile(D, 0.9),
  tr = FALSE,
  change.order = FALSE,
  norm = 1
)

Arguments

x

nxp matrix of p attributes observed for n objects (optional).

k

Number of distances to compute for each object (default: n-1).

D

nxn or nxk dissimilarity matrix (used only of x is not supplied).

J

nxk matrix of indices. D[i,j] is the distance between objects i and J[i,j]. (Used only if D is supplied and ncol(D)<n; then k is set to ncol(D).)

c

Number of clusters

type

Type of focal sets ("simple": empty set, singletons and Omega; "full": all 2^c subsets of \Omega; "pairs": \emptyset, singletons, \Omega, and all or selected pairs).

pairs

Set of pairs to be included in the focal sets; if NULL, all pairs are included. Used only if type="pairs".

m0

Initial credal partition. Should be a matrix with n rows and a number of columns equal to the number f of focal sets specified by 'type' and 'pairs'.

ntrials

Number of runs of the optimization algorithm (set to 1 if m0 is supplied and change.order=FALSE).

disp

If TRUE (default), intermediate results are displayed.

maxit

Maximum number of iterations.

epsi

Minimum amount of improvement.

d0

Parameter used for matrix normalization. The normalized distance corresponding to d0 is 0.95.

tr

If TRUE, a trace of the stress function is returned.

change.order

If TRUE, the order of objects is changed at each iteration of the Iterative Row-wise Quadratic Programming (IRQP) algorithm.

norm

Normalization of distances. 1: division by mean(D^2) (default); 2: division par n*p.

Details

This version of the EVCLUS algorithm uses the Iterative Row-wise Quadratic Programming (IRQP) algorithm (see ter Braak et al., 2009). It also makes it possible to use only a random sample of the dissimilarities, reducing the time and space complexity from quadratic to roughly linear (Denoeux et al., 2016). The user must supply: 1) a matrix x or size (n,p) containing the values of p attributes for n objects, or 2) a matrix D of size (n,n) of dissimilarities between n objects, or 3) a matrix D of size (n,k) of dissimilarities between the n objects and k randomly selected objects, AND a matrix J of size (n,k) of indices, such that D[i,j] is the distance between objects i and J[i,j]. In cases 1 and 2, the user may supply the number $k$ of distances to be picked randomly for each object. In case 3, k is set to the number of columns of D.

Value

The credal partition (an object of class "credpart"). In addition to the usual attributes, the output credal partition has the following attributes:

Kmat

The matrix of degrees of conflict. Same size as D.

D

The normalized dissimilarity matrix.

trace

Trace of the algorithm (Stress function vs iterations).

J

The matrix of indices.

Author(s)

Thierry Denoeux.

References

T. Denoeux and M.-H. Masson. EVCLUS: Evidential Clustering of Proximity Data. IEEE Transactions on Systems, Man and Cybernetics B, Vol. 34, Issue 1, 95–109, 2004.

T. Denoeux, S. Sriboonchitta and O. Kanjanatarakul. Evidential clustering of large dissimilarity data. Knowledge-Based Systems, vol. 106, pages 179-195, 2016.

C. J. ter Braak, Y. Kourmpetis, H. A. Kiers, and M. C. Bink. Approximating a similarity matrix by a latent class model: A reappraisal of additive fuzzy clustering. Computational Statistics & Data Analysis, 53(8):3183–3193, 2009.

See Also

createD, makeF, extractMass

Examples

## Example with a non metric dissimilarity matrix: the Protein dataset
## Not run: 
data(protein)
clus <- kevclus(D=protein$D,c=4,type='simple',d0=max(protein$D))
z<- cmdscale(protein$D,k=2)  # Computation of 2 attributes by Multidimensional Scaling
plot(clus,X=z,mfrow=c(2,2),ytrue=protein$y,Outliers=FALSE,Approx=1)
## Example with k=30
clus <- kevclus(D=protein$D,k=30,c=4,type='simple',d0=max(protein$D))
z<- cmdscale(protein$D,k=2)  # Computation of 2 attributes by Multidimensional Scaling
plot(clus,X=z,mfrow=c(2,2),ytrue=protein$y,Outliers=FALSE,Approx=1)

## End(Not run)


[Package evclust version 2.0.3 Index]