enhcHi {EnsCat} | R Documentation |
This function performs an ensemble hierarchical clustering of high dimensional categorical data (p >> n).
enhcHi(data, En=100, len=c(2,10), type=2)
data |
A nxp data matrix of data frame; n is the number of observations and p is the number of features or dimensions. |
En |
Number of clusterings to include in the ensemble, i.e., cardinality of the ensemble. |
len |
Range of sizes of clusterings (i.e., number of clusters) to run and ensemble. |
type |
Numeric indicator of single bootstrap (type=1) or double bootstrap (type=2) for selecting subsets of variables to include in each clustering within the ensemble. The default is type=2 |
Amiri, S., Clarke, B., and Clarke, J. (2015). Clustering categorical data via ensembling dissimilarity matrices. arXiv preprint arXiv:1506.07930.
#data("rhabdodata") ### The following code generates the dissimilary matrix of sequence data stored in alphadata ### The ensemble has 100 member clusterings, and the number of clusters in each clustering ### is generated randomly from a discrete uniform on (2,10). A double bootstrap procedure is ### used to select a subset of variables for each clustering. #ens<-enhcHi(rhabdodata$dat,En=100,len=c(2,10), type=2) ### Calculate the hamming distance #dis0<-hammingD(ens) ### Save as distance format #REDIST<-as.dist(dis0) #hc0 <- hclust(REDIST,method = "average") #plot(hc0,label=rhabdodata$lab,hang =-1)