enhcHi {EnsCat} | R Documentation |

## Performs ensemble hierarchical clustering for high dimensional categorical data

### Description

This function performs an ensemble hierarchical clustering of high dimensional categorical data (p >> n).

### Usage

```
enhcHi(data, En=100, len=c(2,10), type=2)
```

### Arguments

`data` |
A nxp data matrix of data frame; n is the number of observations and p is the number of features or dimensions. |

`En` |
Number of clusterings to include in the ensemble, i.e., cardinality of the ensemble. |

`len` |
Range of sizes of clusterings (i.e., number of clusters) to run and ensemble. |

`type` |
Numeric indicator of single bootstrap (type=1) or double bootstrap (type=2) for selecting subsets of variables to include in each clustering within the ensemble. The default is type=2 |

### References

Amiri, S., Clarke, B., and Clarke, J. (2015). Clustering categorical data via ensembling dissimilarity matrices. arXiv preprint arXiv:1506.07930.

### Examples

```
#data("rhabdodata")
### The following code generates the dissimilary matrix of sequence data stored in alphadata
### The ensemble has 100 member clusterings, and the number of clusters in each clustering
### is generated randomly from a discrete uniform on (2,10). A double bootstrap procedure is
### used to select a subset of variables for each clustering.
#ens<-enhcHi(rhabdodata$dat,En=100,len=c(2,10), type=2)
### Calculate the hamming distance
#dis0<-hammingD(ens)
### Save as distance format
#REDIST<-as.dist(dis0)
#hc0 <- hclust(REDIST,method = "average")
#plot(hc0,label=rhabdodata$lab,hang =-1)
```

[Package

*EnsCat*version 1.1 Index]