kcevclus {evclust}R Documentation

k-CEVCLUS algorithm

Description

kcevclus computes a credal partition from a dissimilarity matrix and pairwise (must-link and cannot-link) constraints using the k-CEVCLUS algorithm.

Usage

kcevclus(
  x,
  k = n - 1,
  D,
  J,
  c,
  ML,
  CL,
  xi = 0.5,
  type = "simple",
  pairs = NULL,
  m0 = NULL,
  ntrials = 1,
  disp = TRUE,
  maxit = 1000,
  epsi = 1e-05,
  d0 = quantile(D, 0.9),
  tr = FALSE,
  change.order = FALSE,
  norm = 1
)

Arguments

x

nxp matrix of p attributes observed for n objects (optional).

k

Number of distances to compute for each object (default: n-1).

D

nxn or nxk dissimilarity matrix (used only of x is not supplied).

J

nxk matrix of indices. D[i,j] is the distance between objects i and J[i,j]. (Used only if D is supplied and ncol(D)<n; then k is set to ncol(D).)

c

Number of clusters

ML

Matrix nbML x 2 of must-link constraints. Each row of ML contains the indices of objects that belong to the same class.

CL

Matrix nbCL x 2 of cannot-link constraints. Each row of CL contains the indices of objects that belong to different classes.

xi

Penalization coefficient.

type

Type of focal sets ("simple": empty set, singletons and Omega; "full": all 2^c subsets of \Omega; "pairs": \emptyset, singletons, \Omega, and all or selected pairs).

pairs

Set of pairs to be included in the focal sets; if NULL, all pairs are included. Used only if type="pairs".

m0

Initial credal partition. Should be a matrix with n rows and a number of columns equal to the number f of focal sets specified by 'type' and 'pairs'.

ntrials

Number of runs of the optimization algorithm (set to 1 if m0 is supplied and change.order=FALSE).

disp

If TRUE (default), intermediate results are displayed.

maxit

Maximum number of iterations.

epsi

Minimum amount of improvement.

d0

Parameter used for matrix normalization. The normalized distance corresponding to d0 is 0.95.

tr

If TRUE, a trace of the stress function is returned.

change.order

If TRUE, the order of objects is changed at each iteration of the Iterative Row-wise Quadratic Programming (IRQP) algorithm.

norm

Normalization of distances. 1: division by mean(D^2) (default); 2: division par n*p.

Details

k-CEVCLUS is a version of EVCLUS allowing the user to specify pairwise constraints to guide the clustering process. Pairwise constraints are of two kinds: must-link contraints are pairs of objects that are known to belong to the same class, and cannot-link constraints are pairs of objects that are known to belong to different classes. As kevclus, kcevclus uses the Iterative Row-wise Quadratic Programming (IRQP) algorithm (see ter Braak et al., 2009). It also makes it possible to use only a random sample of the dissimilarities, reducing the time and space complexity from quadratic to roughly linear (Denoeux et al., 2016).

Value

The credal partition (an object of class "credpart"). In addition to the usual attributes, the output credal partition has the following attributes:

Kmat

The matrix of degrees of conflict. Same size as D.

D

The normalized dissimilarity matrix.

trace

Trace of the algorithm (Stress function vs iterations).

J

The matrix of indices.

Author(s)

Feng Li and Thierry Denoeux.

References

F. Li, S. Li and T. Denoeux. k-CEVCLUS: Constrained evidential clustering of large dissimilarity data. Knowledge-Based Systems 142:29-44, 2018.

T. Denoeux, S. Sriboonchitta and O. Kanjanatarakul. Evidential clustering of large dissimilarity data. Knowledge-Based Systems 106:179-195, 2016.

V. Antoine, B. Quost, M.-H. Masson and T. Denoeux. CEVCLUS: Evidential clustering with instance-level constraints for relational data. Soft Computing 18(7):1321-1335, 2014.

C. J. ter Braak, Y. Kourmpetis, H. A. Kiers, and M. C. Bink. Approximating a similarity matrix by a latent class model: A reappraisal of additive fuzzy clustering. Computational Statistics & Data Analysis 53(8):3183–3193, 2009.

See Also

kevclus,createD, makeF, extractMass, create_MLCL,bananas, nnevclus

Examples

## Not run: 
data<-bananas(2000)
D<-as.matrix(dist(data$x))
link<-create_MLCL(data$y,2000)
clus0<-kevclus(D=D,k=200,c=2)
clus1<-kcevclus(D=D,k=200,c=2,ML=link2$ML,CL=link2$CL,Xi=0.1,m0=clus0$mass)
clus2<-kcevclus(D=D,k=200,c=2,ML=link2$ML,CL=link2$CL,Xi=0.5,m0=clus1$mass)
plot(clus2,X=data$x,ytrue=data$y,Outliers=FALSE,Approx=1)

## End(Not run)


[Package evclust version 2.0.3 Index]