enhcHi {EnsCat}R Documentation

Performs ensemble hierarchical clustering for high dimensional categorical data

Description

This function performs an ensemble hierarchical clustering of high dimensional categorical data (p >> n).

Usage

enhcHi(data, En=100, len=c(2,10), type=2)

Arguments

data

A nxp data matrix of data frame; n is the number of observations and p is the number of features or dimensions.

En

Number of clusterings to include in the ensemble, i.e., cardinality of the ensemble.

len

Range of sizes of clusterings (i.e., number of clusters) to run and ensemble.

type

Numeric indicator of single bootstrap (type=1) or double bootstrap (type=2) for selecting subsets of variables to include in each clustering within the ensemble. The default is type=2

References

Amiri, S., Clarke, B., and Clarke, J. (2015). Clustering categorical data via ensembling dissimilarity matrices. arXiv preprint arXiv:1506.07930.

Examples

#data("rhabdodata")
### The following code generates the dissimilary matrix of sequence data stored in alphadata
### The ensemble has 100 member clusterings, and the number of clusters in each clustering
### is generated randomly from a discrete uniform on (2,10). A double bootstrap procedure is
### used to select a subset of variables for each clustering.
#ens<-enhcHi(rhabdodata$dat,En=100,len=c(2,10), type=2)
### Calculate the hamming distance
#dis0<-hammingD(ens)
### Save as distance format
#REDIST<-as.dist(dis0)
#hc0 <- hclust(REDIST,method = "average")
#plot(hc0,label=rhabdodata$lab,hang =-1)

[Package EnsCat version 1.1 Index]