cecm {evclust}R Documentation

Constrained Evidential c-means algorithm

Description

cecm computes a credal partition from a matrix of attribute data and pairwise constraints using the Constrained Evidential c-means (CECM) algorithm.

Usage

cecm(
  x,
  c,
  type = "full",
  pairs = NULL,
  ntrials = 1,
  ML,
  CL,
  g0 = NULL,
  alpha = 1,
  delta = 10,
  xi = 0.5,
  distance = 0,
  epsi = 0.001,
  disp = TRUE
)

Arguments

x

input matrix of size n x d, where n is the number of objects and d the number of attributes.

c

Number of clusters.

type

Type of focal sets ("simple": empty set, singletons and Omega; "full": all 2^c subsets of \Omega; "pairs": \emptyset, singletons, \Omega, and all or selected pairs).

pairs

Set of pairs to be included in the focal sets; if NULL, all pairs are included. Used only if type="pairs".

ntrials

Number of runs of the optimization algorithm (set to 1 if g0 is supplied).

ML

Matrix nbML x 2 of must-link constraints. Each row of ML contains the indices of objects that belong to the same class.

CL

Matrix nbCL x 2 of cannot-link constraints. Each row of CL contains the indices of objects that belong to different classes.

g0

Initial prototypes, matrix of size c x d. If not supplied, the prototypes are initialized randomly.

alpha

Exponent of the cardinality in the cost function.

delta

Distance to the empty set.

xi

Tradeoff between the objective function Jecm and the constraints: Jcecm=(1-xi)Jecm + xi Jconst.

distance

Type of distance use: 0=Euclidean, 1=Mahalanobis.

epsi

Minimum amount of improvement.

disp

If TRUE (default), intermediate results are displayed.

Details

CECM is a version of ECM allowing the user to specify pairwise constraints to guide the clustering process. Pairwise constraints are of two kinds: must-link contraints are pairs of objects that are known to belong to the same class, and cannot-link constraints are pairs of objects that are known to belong to different classes. CECM can also learn a metric for each cluster, like the Gustafson-Kessel algorithm in fuzzy clustering. At each iteration, the algorithm solves a quadratic programming problem using an interior ellipsoidal trust region and barrier function algorithm with dual solution updating technique in the standard QP form (Ye, 1992).

If initial prototypes g0 are provided, the number of trials is automatically set to 1.

Remark: Due to the use of the Matrix package, messages may be generated by R's (S4) method dispatch mechanism. They are not error messages, and they can be ignored.

Value

The credal partition (an object of class "credpart").

Author(s)

Thierry Denoeux (from a MATLAB code written by Violaine Antoine).

References

V. Antoine, B. Quost, M.-H. Masson and T. Denoeux. CECM: Constrained Evidential C-Means algorithm. Computational Statistics and Data Analysis, Vol. 56, Issue 4, pages 894–914, 2012.

Y. Ye. On affine-scaling algorithm for nonconvex quadratic programming. Math. Programming 56 (1992) 285–300.

See Also

create_MLCL, makeF, extractMass, ecm, recm

Examples

## Generation of a two-class dataset
## Not run: 
n<-30
x<-cbind(0.2*rnorm(n),rnorm(n))
y<-c(rep(1,n/2),rep(2,n/2))
x[(n/2+1):n,1]<-x[(n/2+1):n,1]+1
plot(x[,1],x[,2],asp=1,pch=y,col=y)
## Generation of 10 constraints
const<-create_MLCL(y,nbConst=10)
## Call of cecm
clus<-cecm(x=x,c=2,ML=const$M,CL=const$CL,delta=10)
plot(x[,1],x[,2],asp=1,pch=clus$y.pl,col=y)

## End(Not run)

[Package evclust version 2.0.3 Index]