kmeansCBI {fpc} | R Documentation |
Interface functions for clustering methods
Description
These functions provide an interface to several clustering methods
implemented in R, for use together with the cluster stability
assessment in clusterboot
(as parameter
clustermethod
; "CBI" stands for "clusterboot interface").
In some situations it could make sense to use them to compute a
clustering even if you don't want to run clusterboot
, because
some of the functions contain some additional features (e.g., normal
mixture model based clustering of dissimilarity matrices projected
into the Euclidean space by MDS or partitioning around medoids with
estimated number of clusters, noise/outlier identification in
hierarchical clustering).
Usage
kmeansCBI(data,krange,k,scaling=FALSE,runs=1,criterion="ch",...)
hclustCBI(data,k,cut="number",method,scaling=TRUE,noisecut=0,...)
hclusttreeCBI(data,minlevel=2,method,scaling=TRUE,...)
disthclustCBI(dmatrix,k,cut="number",method,noisecut=0,...)
noisemclustCBI(data,G,k,modelNames,nnk,hcmodel=NULL,Vinv=NULL,
summary.out=FALSE,...)
distnoisemclustCBI(dmatrix,G,k,modelNames,nnk,
hcmodel=NULL,Vinv=NULL,mdsmethod="classical",
mdsdim=4, summary.out=FALSE, points.out=FALSE,...)
claraCBI(data,k,usepam=TRUE,diss=inherits(data,"dist"),...)
pamkCBI(data,krange=2:10,k=NULL,criterion="asw", usepam=TRUE,
scaling=FALSE,diss=inherits(data,"dist"),...)
tclustCBI(data,k,trim=0.05,...)
dbscanCBI(data,eps,MinPts,diss=inherits(data,"dist"),...)
mahalCBI(data,clustercut=0.5,...)
mergenormCBI(data, G=NULL, k=NULL, modelNames=NULL, nnk=0,
hcmodel = NULL,
Vinv = NULL, mergemethod="bhat",
cutoff=0.1,...)
speccCBI(data,k,...)
pdfclustCBI(data,...)
stupidkcentroidsCBI(dmatrix,k,distances=TRUE)
stupidknnCBI(dmatrix,k)
stupidkfnCBI(dmatrix,k)
stupidkavenCBI(dmatrix,k)
Arguments
data |
a numeric matrix. The data
matrix - usually a cases*variables-data matrix. |
dmatrix |
a squared numerical dissimilarity matrix or a
|
k |
numeric, usually integer. In most cases, this is the number
of clusters for methods where this is fixed. For |
scaling |
either a logical value or a numeric vector of length
equal to the number of variables. If |
runs |
integer. Number of random initializations from which the k-means algorithm is started. |
criterion |
|
cut |
either "level" or "number". This determines how
|
method |
method for hierarchical clustering, see the
documentation of |
noisecut |
numeric. All clusters of size |
minlevel |
integer. |
G |
vector of integers. Number of clusters or numbers of clusters
used by
|
modelNames |
vector of string. Models for covariance matrices,
see documentation of
|
nnk |
integer. Tuning constant for
|
hcmodel |
string or |
Vinv |
numeric. See documentation of
|
summary.out |
logical. If |
mdsmethod |
"classical", "kruskal" or "sammon". Determines the
multidimensional scaling method to compute Euclidean data from a
dissimilarity matrix. See |
mdsdim |
integer. Dimensionality of MDS solution. |
points.out |
logical. If |
usepam |
logical. If |
diss |
logical. If |
krange |
vector of integers. Numbers of clusters to be compared. |
trim |
numeric between 0 and 1. Proportion of data points
trimmed, i.e., assigned to noise. See |
eps |
numeric. The radius of the neighborhoods to be considered
by |
MinPts |
integer. How many points have to be in a neighborhood so
that a point is considered to be a cluster seed? See documentation
of |
clustercut |
numeric between 0 and 1. If |
mergemethod |
method for merging Gaussians, passed on as
|
cutoff |
numeric between 0 and 1, tuning constant for
|
distances |
logical (only for |
... |
further parameters to be transferred to the original clustering functions (not required). |
Details
All these functions call clustering methods implemented in R to
cluster data and to provide output in the format required by
clusterboot
. Here is a brief overview. For further
details see the help pages of the involved clustering methods.
- kmeansCBI
an interface to the function
kmeansruns
callingkmeans
for k-means clustering. (kmeansruns
allows the specification of several random initializations of the k-means algorithm and estimation of k by the Calinski-Harabasz index or the average silhouette width.)- hclustCBI
an interface to the function
hclust
for agglomerative hierarchical clustering with noise component (see parameternoisecut
above). This function produces a partition and assumes a cases*variables matrix as input.- hclusttreeCBI
an interface to the function
hclust
for agglomerative hierarchical clustering. This function gives out all clusters belonging to the hierarchy (upward from a certain level, see parameterminlevel
above).- disthclustCBI
an interface to the function
hclust
for agglomerative hierarchical clustering with noise component (see parameternoisecut
above). This function produces a partition and assumes a dissimilarity matrix as input.- noisemclustCBI
an interface to the function
mclustBIC
, for normal mixture model based clustering. Warning:mclustBIC
often has problems with multiple points. Inclusterboot
, it is recommended to use this together withmultipleboot=FALSE
.- distnoisemclustCBI
an interface to the function
mclustBIC
for normal mixture model based clustering. This assumes a dissimilarity matrix as input and generates a data matrix by multidimensional scaling first. Warning:mclustBIC
often has problems with multiple points. Inclusterboot
, it is recommended to use this together withmultipleboot=FALSE
.- claraCBI
an interface to the functions
pam
andclara
for partitioning around medoids.- pamkCBI
an interface to the function
pamk
callingpam
for partitioning around medoids. The number of clusters is estimated by the Calinski-Harabasz index or by the average silhouette width.- tclustCBI
an interface to the function
tclust
in the tclust package for trimmed Gaussian clustering. This assumes a cases*variables matrix as input.- dbscanCBI
an interface to the function
dbscan
for density based clustering.- mahalCBI
an interface to the function
fixmahal
for fixed point clustering. This assumes a cases*variables matrix as input.- mergenormCBI
an interface to the function
mergenormals
for clustering by merging Gaussian mixture components. Unlikemergenormals
,mergenormCBI
includes the computation of the initial Gaussian mixture. This assumes a cases*variables matrix as input.- speccCBI
an interface to the function
specc
for spectral clustering. See thespecc
help page for additional tuning parameters. This assumes a cases*variables matrix as input.- pdfclustCBI
an interface to the function
pdfCluster
for density-based clustering. See thepdfCluster
help page for additional tuning parameters. This assumes a cases*variables matrix as input.- stupidkcentroidsCBI
an interface to the function
stupidkcentroids
for random centroid-based clustering. See thestupidkcentroids
help page. This can have a distance matrix as well as a cases*variables matrix as input, see parameterdistances
.- stupidknnCBI
an interface to the function
stupidknn
for random nearest neighbour clustering. See thestupidknn
help page. This assumes a distance matrix as input.- stupidkfnCBI
an interface to the function
stupidkfn
for random farthest neighbour clustering. See thestupidkfn
help page. This assumes a distance matrix as input.- stupidkavenCBI
an interface to the function
stupidkaven
for random average dissimilarity clustering. See thestupidkaven
help page. This assumes a distance matrix as input.
Value
All interface functions return a list with the following components
(there may be some more, see summary.out
and points.out
above):
result |
clustering result, usually a list with the full output of the clustering method (the precise format doesn't matter); whatever you want to use later. |
nc |
number of clusters. If some points don't belong to any
cluster, these are declared "noise". |
clusterlist |
this is a list consisting of a logical vectors
of length of the number of data points ( |
partition |
an integer vector of length |
clustermethod |
a string indicating the clustering method. |
The output of some of the functions has further components:
nccl |
see |
nnk |
by |
initnoise |
logical vector, indicating initially estimated noise by
|
noise |
logical. |
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en/
See Also
clusterboot
, dist
,
kmeans
, kmeansruns
, hclust
,
mclustBIC
,
pam
, pamk
,
clara
,
dbscan
,
fixmahal
,
tclust
, pdfCluster
Examples
options(digits=3)
set.seed(20000)
face <- rFace(50,dMoNo=2,dNoEy=0,p=2)
dbs <- dbscanCBI(face,eps=1.5,MinPts=4)
dhc <- disthclustCBI(dist(face),method="average",k=1.5,noisecut=2)
table(dbs$partition,dhc$partition)
dm <- mergenormCBI(face,G=10,modelNames="EEE",nnk=2)
dtc <- tclustCBI(face,6,trim=0.1,restr.fact=500)
table(dm$partition,dtc$partition)