R: k-CEVCLUS algorithm

kcevclus {evclust}

R Documentation

k-CEVCLUS algorithm

Description

kcevclus computes a credal partition from a dissimilarity matrix and pairwise (must-link and cannot-link) constraints using the k-CEVCLUS algorithm.

Usage

kcevclus(
  x,
  k = n - 1,
  D,
  J,
  c,
  ML,
  CL,
  xi = 0.5,
  type = "simple",
  pairs = NULL,
  m0 = NULL,
  ntrials = 1,
  disp = TRUE,
  maxit = 1000,
  epsi = 1e-05,
  d0 = quantile(D, 0.9),
  tr = FALSE,
  change.order = FALSE,
  norm = 1
)

Arguments

`x`	nxp matrix of p attributes observed for n objects (optional).
`k`	Number of distances to compute for each object (default: n-1).
`D`	nxn or nxk dissimilarity matrix (used only of x is not supplied).
`J`	nxk matrix of indices. D[i,j] is the distance between objects i and J[i,j]. (Used only if D is supplied and ncol(D)<n; then k is set to ncol(D).)
`c`	Number of clusters
`ML`	Matrix nbML x 2 of must-link constraints. Each row of ML contains the indices of objects that belong to the same class.
`CL`	Matrix nbCL x 2 of cannot-link constraints. Each row of CL contains the indices of objects that belong to different classes.
`xi`	Penalization coefficient.
`type`	Type of focal sets ("simple": empty set, singletons and Omega; "full": all `2^c` subsets of `\Omega`; "pairs": `\emptyset`, singletons, `\Omega`, and all or selected pairs).
`pairs`	Set of pairs to be included in the focal sets; if NULL, all pairs are included. Used only if type="pairs".
`m0`	Initial credal partition. Should be a matrix with n rows and a number of columns equal to the number f of focal sets specified by 'type' and 'pairs'.
`ntrials`	Number of runs of the optimization algorithm (set to 1 if m0 is supplied and change.order=FALSE).
`disp`	If TRUE (default), intermediate results are displayed.
`maxit`	Maximum number of iterations.
`epsi`	Minimum amount of improvement.
`d0`	Parameter used for matrix normalization. The normalized distance corresponding to d0 is 0.95.
`tr`	If TRUE, a trace of the stress function is returned.
`change.order`	If TRUE, the order of objects is changed at each iteration of the Iterative Row-wise Quadratic Programming (IRQP) algorithm.
`norm`	Normalization of distances. 1: division by mean(D^2) (default); 2: division par n*p.

Details

k-CEVCLUS is a version of EVCLUS allowing the user to specify pairwise constraints to guide the clustering process. Pairwise constraints are of two kinds: must-link contraints are pairs of objects that are known to belong to the same class, and cannot-link constraints are pairs of objects that are known to belong to different classes. As kevclus, kcevclus uses the Iterative Row-wise Quadratic Programming (IRQP) algorithm (see ter Braak et al., 2009). It also makes it possible to use only a random sample of the dissimilarities, reducing the time and space complexity from quadratic to roughly linear (Denoeux et al., 2016).

Value

The credal partition (an object of class "credpart"). In addition to the usual attributes, the output credal partition has the following attributes:

Kmat: The matrix of degrees of conflict. Same size as D.
D: The normalized dissimilarity matrix.
trace: Trace of the algorithm (Stress function vs iterations).
J: The matrix of indices.

Author(s)

Feng Li and Thierry Denoeux.

References

F. Li, S. Li and T. Denoeux. k-CEVCLUS: Constrained evidential clustering of large dissimilarity data. Knowledge-Based Systems 142:29-44, 2018.

T. Denoeux, S. Sriboonchitta and O. Kanjanatarakul. Evidential clustering of large dissimilarity data. Knowledge-Based Systems 106:179-195, 2016.

V. Antoine, B. Quost, M.-H. Masson and T. Denoeux. CEVCLUS: Evidential clustering with instance-level constraints for relational data. Soft Computing 18(7):1321-1335, 2014.

C. J. ter Braak, Y. Kourmpetis, H. A. Kiers, and M. C. Bink. Approximating a similarity matrix by a latent class model: A reappraisal of additive fuzzy clustering. Computational Statistics & Data Analysis 53(8):3183–3193, 2009.

Examples

## Not run: 
data<-bananas(2000)
D<-as.matrix(dist(data$x))
link<-create_MLCL(data$y,2000)
clus0<-kevclus(D=D,k=200,c=2)
clus1<-kcevclus(D=D,k=200,c=2,ML=link2$ML,CL=link2$CL,Xi=0.1,m0=clus0$mass)
clus2<-kcevclus(D=D,k=200,c=2,ML=link2$ML,CL=link2$CL,Xi=0.5,m0=clus1$mass)
plot(clus2,X=data$x,ytrue=data$y,Outliers=FALSE,Approx=1)

## End(Not run)

[Package evclust version 2.0.3 Index]