enhcHi {EnsCat} | R Documentation |
This function performs an ensemble hierarchical clustering of high dimensional categorical data (p >> n).
enhcHi(data, En=100, len=c(2,10), type=2)
data |
A nxp data matrix of data frame; n is the number of observations and p is the number of features or dimensions. |
En |
Number of clusterings to include in the ensemble, i.e., cardinality of the ensemble. |
len |
Range of sizes of clusterings (i.e., number of clusters) to run and ensemble. |
type |
Numeric indicator of single bootstrap (type=1) or double bootstrap (type=2) for selecting subsets of variables to include in each clustering within the ensemble. The default is type=2 |
Amiri, S., Clarke, B., and Clarke, J. (2015). Clustering categorical data via ensembling dissimilarity matrices. arXiv preprint arXiv:1506.07930.
#data("rhabdodata")
### The following code generates the dissimilary matrix of sequence data stored in alphadata
### The ensemble has 100 member clusterings, and the number of clusters in each clustering
### is generated randomly from a discrete uniform on (2,10). A double bootstrap procedure is
### used to select a subset of variables for each clustering.
#ens<-enhcHi(rhabdodata$dat,En=100,len=c(2,10), type=2)
### Calculate the hamming distance
#dis0<-hammingD(ens)
### Save as distance format
#REDIST<-as.dist(dis0)
#hc0 <- hclust(REDIST,method = "average")
#plot(hc0,label=rhabdodata$lab,hang =-1)