R: Group Variables with Hierarchical Clustering

s.cluster.h.group {ldt}

R Documentation

Group Variables with Hierarchical Clustering

Description

This function groups the columns of a numeric matrix based on the hierarchical clustering algorithm.

Usage

s.cluster.h.group(
  data,
  nGroups = 2,
  threshold = 0,
  distance = "correlation",
  linkage = "single",
  correlation = "pearson"
)

Arguments

`data`	A numeric matrix with variables in the columns.
`nGroups`	Integer value specifying the number of required groups.
`threshold`	Numeric value specifying a threshold for omitting variables. If the distance between two variables in a group is less than this value, the second one will be omitted. Note that a change in the order of the columns might change the results.
`distance`	Character string specifying how distances are calculated. It can be `correlation`, `absCorrelation`, `euclidean`, `manhattan`, or `maximum`. See `s.distance` function.
`linkage`	Character string specifying how distances are calculated in a left-right node merge. It can be `single`, `complete`, `uAverage`, `wAverage`, or `ward`. See `s.cluster.h` function.
`correlation`	Character string specifying the type of correlation if `distance` is correlation. It can be `pearson` or `spearman`. See `s.distance` function.

Details

The results might be different from R's 'cutree' function. (I don't know how 'cutree' works) Here this function iterates over the nodes and whenever a split occurs, it adds a group until the required number of groups is reached.

Value

A list with the following items:

`groups`	A list of integer vectors representing the indexes of variables in each group.
`removed`	An integer vector representing the indexes of removed variables.

[Package ldt version 0.5.3 Index]