s.cluster.h.group {ldt}R Documentation

Group Variables with Hierarchical Clustering

Description

This function groups the columns of a numeric matrix based on the hierarchical clustering algorithm.

Usage

s.cluster.h.group(
  data,
  nGroups = 2,
  threshold = 0,
  distance = "correlation",
  linkage = "single",
  correlation = "pearson"
)

Arguments

data

A numeric matrix with variables in the columns.

nGroups

Integer value specifying the number of required groups.

threshold

Numeric value specifying a threshold for omitting variables. If the distance between two variables in a group is less than this value, the second one will be omitted. Note that a change in the order of the columns might change the results.

distance

Character string specifying how distances are calculated. It can be correlation, absCorrelation, euclidean, manhattan, or maximum. See s.distance function.

linkage

Character string specifying how distances are calculated in a left-right node merge. It can be single, complete, uAverage, wAverage, or ward. See s.cluster.h function.

correlation

Character string specifying the type of correlation if distance is correlation. It can be pearson or spearman. See s.distance function.

Details

The results might be different from R's 'cutree' function. (I don't know how 'cutree' works) Here this function iterates over the nodes and whenever a split occurs, it adds a group until the required number of groups is reached.

Value

A list with the following items:

groups

A list of integer vectors representing the indexes of variables in each group.

removed

An integer vector representing the indexes of removed variables.


[Package ldt version 0.5.3 Index]