hclu_hierarclust {bioregion} | R Documentation |
This function generates a hierarchical tree from a dissimilarity
(beta-diversity) data.frame
, calculates the cophenetic correlation
coefficient, and can get clusters from the tree if requested by the user.
The function implements randomization of the dissimilarity matrix to
generate the tree, with a selection method based on the optimal cophenetic
correlation coefficient. Typically, the dissimilarity data.frame
is a
bioregion.pairwise.metric
object obtained by running similarity
or similarity
and then similarity_to_dissimilarity
.
hclu_hierarclust(
dissimilarity,
index = names(dissimilarity)[3],
method = "average",
randomize = TRUE,
n_runs = 30,
keep_trials = FALSE,
optimal_tree_method = "best",
n_clust = NULL,
cut_height = NULL,
find_h = TRUE,
h_max = 1,
h_min = 0
)
dissimilarity |
the output object from |
index |
name or number of the dissimilarity column to use. By default,
the third column name of |
method |
name of the hierarchical classification method, as in
fastcluster::hclust(). Should be one of |
randomize |
a boolean indicating if the dissimilarity matrix should be randomized, to account for the order of sites in the dissimilarity matrix. |
n_runs |
number of trials to randomize the dissimilarity matrix. |
keep_trials |
a boolean indicating if all random trial results.
should be stored in the output object (set to FALSE to save space if your
|
optimal_tree_method |
a character vector indicating how the final tree
should be obtained from all trials. The only option currently is
|
n_clust |
an integer or a vector of integers indicating the number of
clusters to be obtained from the hierarchical tree, or the output from
partition_metrics. Should not be used at the same time as
|
cut_height |
a numeric vector indicating the height(s) at which the
tree should be cut. Should not be used at the same time as |
find_h |
a boolean indicating if the height of cut should be found for
the requested |
h_max |
a numeric indicating the maximum possible tree height for
the chosen |
h_min |
a numeric indicating the minimum possible height in the tree
for the chosen |
The default method for the hierarchical tree is "average"
, i.e.
UPGMA as it has been recommended as the best method to generate a tree
from beta diversity dissimilarity (Kreft and Jetz 2010)
Clusters can be obtained by two methods:
Specifying a desired number of clusters in n_clust
Specifying one or several heights of cut in cut_height
To find an optimal number of clusters, see partition_metrics()
A list
of class bioregion.clusters
with five slots:
name: character string
containing the name of the algorithm
args: list
of input arguments as provided by the user
inputs: list
of characteristics of the clustering process
algorithm: list
of all objects associated with the
clustering procedure, such as original cluster objects
clusters: data.frame
containing the clustering results
In the algorithm
slot, users can find the following elements:
trials
: a list containing all randomization trials. Each trial
contains the dissimilarity matrix, with site order randomized, the
associated tree and the cophenetic correlation coefficient (Spearman) for
that tree
final.tree
: a hclust
object containing the final
hierarchical tree to be used
final.tree.coph.cor
: the cophenetic correlation coefficient
between the initial dissimilarity matrix and final.tree
Boris Leroy (leroy.boris@gmail.com), Pierre Denelle (pierre.denelle@gmail.com) and Maxime Lenormand (maxime.lenormand@inrae.fr)
Kreft H, Jetz W (2010). “A framework for delineating biogeographical regions based on species distributions.” Journal of Biogeography, 37, 2029–2053.
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
dissim <- dissimilarity(comat, metric = "all")
# User-defined number of clusters
tree1 <- hclu_hierarclust(dissim, n_clust = 5)
tree1
plot(tree1)
str(tree1)
tree1$clusters
# User-defined height cut
# Only one height
tree2 <- hclu_hierarclust(dissim, cut_height = .05)
tree2
tree2$clusters
# Multiple heights
tree3 <- hclu_hierarclust(dissim, cut_height = c(.05, .15, .25))
tree3$clusters # Mind the order of height cuts: from deep to shallow cuts
# Info on each partition can be found in table cluster_info
tree3$cluster_info
plot(tree3)
# Recut the tree afterwards
tree3.1 <- cut_tree(tree3, n = 5)
tree4 <- hclu_hierarclust(dissim, n_clust = 1:19)