R: Performs ensemble hierarchical clustering for high...

enhcHi {EnsCat}

R Documentation

Performs ensemble hierarchical clustering for high dimensional categorical data

Description

This function performs an ensemble hierarchical clustering of high dimensional categorical data (p >> n).

Usage

enhcHi(data, En=100, len=c(2,10), type=2)

Arguments

`data`	A nxp data matrix of data frame; n is the number of observations and p is the number of features or dimensions.
`En`	Number of clusterings to include in the ensemble, i.e., cardinality of the ensemble.
`len`	Range of sizes of clusterings (i.e., number of clusters) to run and ensemble.
`type`	Numeric indicator of single bootstrap (type=1) or double bootstrap (type=2) for selecting subsets of variables to include in each clustering within the ensemble. The default is type=2

References

Amiri, S., Clarke, B., and Clarke, J. (2015). Clustering categorical data via ensembling dissimilarity matrices. arXiv preprint arXiv:1506.07930.

Examples

#data("rhabdodata")
### The following code generates the dissimilary matrix of sequence data stored in alphadata
### The ensemble has 100 member clusterings, and the number of clusters in each clustering
### is generated randomly from a discrete uniform on (2,10). A double bootstrap procedure is
### used to select a subset of variables for each clustering.
#ens<-enhcHi(rhabdodata$dat,En=100,len=c(2,10), type=2)
### Calculate the hamming distance
#dis0<-hammingD(ens)
### Save as distance format
#REDIST<-as.dist(dis0)
#hc0 <- hclust(REDIST,method = "average")
#plot(hc0,label=rhabdodata$lab,hang =-1)

[Package EnsCat version 1.1 Index]