kevclus {evclust} | R Documentation |
k-EVCLUS algorithm
Description
kevclus
computes a credal partition from a dissimilarity matrix using the k-EVCLUS
algorithm.
Usage
kevclus(
x,
k = n - 1,
D,
J,
c,
type = "simple",
pairs = NULL,
m0 = NULL,
ntrials = 1,
disp = TRUE,
maxit = 1000,
epsi = 1e-05,
d0 = quantile(D, 0.9),
tr = FALSE,
change.order = FALSE,
norm = 1
)
Arguments
x |
nxp matrix of p attributes observed for n objects (optional). |
k |
Number of distances to compute for each object (default: n-1). |
D |
nxn or nxk dissimilarity matrix (used only of x is not supplied). |
J |
nxk matrix of indices. D[i,j] is the distance between objects i and J[i,j]. (Used only if D is supplied and ncol(D)<n; then k is set to ncol(D).) |
c |
Number of clusters |
type |
Type of focal sets ("simple": empty set, singletons and Omega;
"full": all |
pairs |
Set of pairs to be included in the focal sets; if NULL, all pairs are included. Used only if type="pairs". |
m0 |
Initial credal partition. Should be a matrix with n rows and a number of columns equal to the number f of focal sets specified by 'type' and 'pairs'. |
ntrials |
Number of runs of the optimization algorithm (set to 1 if m0 is supplied and change.order=FALSE). |
disp |
If TRUE (default), intermediate results are displayed. |
maxit |
Maximum number of iterations. |
epsi |
Minimum amount of improvement. |
d0 |
Parameter used for matrix normalization. The normalized distance corresponding to d0 is 0.95. |
tr |
If TRUE, a trace of the stress function is returned. |
change.order |
If TRUE, the order of objects is changed at each iteration of the Iterative Row-wise Quadratic Programming (IRQP) algorithm. |
norm |
Normalization of distances. 1: division by mean(D^2) (default); 2: division par n*p. |
Details
This version of the EVCLUS algorithm uses the Iterative Row-wise Quadratic Programming (IRQP) algorithm (see ter Braak et al., 2009). It also makes it possible to use only a random sample of the dissimilarities, reducing the time and space complexity from quadratic to roughly linear (Denoeux et al., 2016). The user must supply: 1) a matrix x or size (n,p) containing the values of p attributes for n objects, or 2) a matrix D of size (n,n) of dissimilarities between n objects, or 3) a matrix D of size (n,k) of dissimilarities between the n objects and k randomly selected objects, AND a matrix J of size (n,k) of indices, such that D[i,j] is the distance between objects i and J[i,j]. In cases 1 and 2, the user may supply the number $k$ of distances to be picked randomly for each object. In case 3, k is set to the number of columns of D.
Value
The credal partition (an object of class "credpart"
). In addition to the
usual attributes, the output credal partition has the following attributes:
- Kmat
The matrix of degrees of conflict. Same size as D.
- D
The normalized dissimilarity matrix.
- trace
Trace of the algorithm (Stress function vs iterations).
- J
The matrix of indices.
Author(s)
Thierry Denoeux.
References
T. Denoeux and M.-H. Masson. EVCLUS: Evidential Clustering of Proximity Data. IEEE Transactions on Systems, Man and Cybernetics B, Vol. 34, Issue 1, 95–109, 2004.
T. Denoeux, S. Sriboonchitta and O. Kanjanatarakul. Evidential clustering of large dissimilarity data. Knowledge-Based Systems, vol. 106, pages 179-195, 2016.
C. J. ter Braak, Y. Kourmpetis, H. A. Kiers, and M. C. Bink. Approximating a similarity matrix by a latent class model: A reappraisal of additive fuzzy clustering. Computational Statistics & Data Analysis, 53(8):3183–3193, 2009.
See Also
Examples
## Example with a non metric dissimilarity matrix: the Protein dataset
## Not run:
data(protein)
clus <- kevclus(D=protein$D,c=4,type='simple',d0=max(protein$D))
z<- cmdscale(protein$D,k=2) # Computation of 2 attributes by Multidimensional Scaling
plot(clus,X=z,mfrow=c(2,2),ytrue=protein$y,Outliers=FALSE,Approx=1)
## Example with k=30
clus <- kevclus(D=protein$D,k=30,c=4,type='simple',d0=max(protein$D))
z<- cmdscale(protein$D,k=2) # Computation of 2 attributes by Multidimensional Scaling
plot(clus,X=z,mfrow=c(2,2),ytrue=protein$y,Outliers=FALSE,Approx=1)
## End(Not run)