hclu_hierarclust {bioregion}  R Documentation 
This function generates a hierarchical tree from a dissimilarity
(betadiversity) data.frame
, calculates the cophenetic correlation
coefficient, and can get clusters from the tree if requested by the user.
The function implements randomization of the dissimilarity matrix to
generate the tree, with a selection method based on the optimal cophenetic
correlation coefficient. Typically, the dissimilarity data.frame
is a
bioregion.pairwise.metric
object obtained by running similarity
or similarity
and then similarity_to_dissimilarity
.
hclu_hierarclust(
dissimilarity,
index = names(dissimilarity)[3],
method = "average",
randomize = TRUE,
n_runs = 30,
keep_trials = FALSE,
optimal_tree_method = "best",
n_clust = NULL,
cut_height = NULL,
find_h = TRUE,
h_max = 1,
h_min = 0
)
dissimilarity 
the output object from 
index 
name or number of the dissimilarity column to use. By default,
the third column name of 
method 
name of the hierarchical classification method, as in
fastcluster::hclust(). Should be one of 
randomize 
a boolean indicating if the dissimilarity matrix should be randomized, to account for the order of sites in the dissimilarity matrix. 
n_runs 
number of trials to randomize the dissimilarity matrix. 
keep_trials 
a boolean indicating if all random trial results.
should be stored in the output object (set to FALSE to save space if your

optimal_tree_method 
a character vector indicating how the final tree
should be obtained from all trials. The only option currently is

n_clust 
an integer or a vector of integers indicating the number of
clusters to be obtained from the hierarchical tree, or the output from
partition_metrics. Should not be used at the same time as

cut_height 
a numeric vector indicating the height(s) at which the
tree should be cut. Should not be used at the same time as 
find_h 
a boolean indicating if the height of cut should be found for
the requested 
h_max 
a numeric indicating the maximum possible tree height for
the chosen 
h_min 
a numeric indicating the minimum possible height in the tree
for the chosen 
The default method for the hierarchical tree is "average"
, i.e.
UPGMA as it has been recommended as the best method to generate a tree
from beta diversity dissimilarity (Kreft and Jetz 2010)
Clusters can be obtained by two methods:
Specifying a desired number of clusters in n_clust
Specifying one or several heights of cut in cut_height
To find an optimal number of clusters, see partition_metrics()
A list
of class bioregion.clusters
with five slots:
name: character string
containing the name of the algorithm
args: list
of input arguments as provided by the user
inputs: list
of characteristics of the clustering process
algorithm: list
of all objects associated with the
clustering procedure, such as original cluster objects
clusters: data.frame
containing the clustering results
In the algorithm
slot, users can find the following elements:
trials
: a list containing all randomization trials. Each trial
contains the dissimilarity matrix, with site order randomized, the
associated tree and the cophenetic correlation coefficient (Spearman) for
that tree
final.tree
: a hclust
object containing the final
hierarchical tree to be used
final.tree.coph.cor
: the cophenetic correlation coefficient
between the initial dissimilarity matrix and final.tree
Boris Leroy (leroy.boris@gmail.com), Pierre Denelle (pierre.denelle@gmail.com) and Maxime Lenormand (maxime.lenormand@inrae.fr)
Kreft H, Jetz W (2010). “A framework for delineating biogeographical regions based on species distributions.” Journal of Biogeography, 37, 2029–2053.
comat < matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) < paste0("Site",1:20)
colnames(comat) < paste0("Species",1:25)
dissim < dissimilarity(comat, metric = "all")
# Userdefined number of clusters
tree1 < hclu_hierarclust(dissim, n_clust = 5)
tree1
plot(tree1)
str(tree1)
tree1$clusters
# Userdefined height cut
# Only one height
tree2 < hclu_hierarclust(dissim, cut_height = .05)
tree2
tree2$clusters
# Multiple heights
tree3 < hclu_hierarclust(dissim, cut_height = c(.05, .15, .25))
tree3$clusters # Mind the order of height cuts: from deep to shallow cuts
# Info on each partition can be found in table cluster_info
tree3$cluster_info
plot(tree3)
# Recut the tree afterwards
tree3.1 < cut_tree(tree3, n = 5)
tree4 < hclu_hierarclust(dissim, n_clust = 1:19)