get_clade_list {castor}R Documentation

Get a representation of a tree as a table listing tips/nodes.

Description

Given a tree in standard "phylo" format, calculate an alternative representation of the tree structure as a list of tips/nodes with basic information on parents, children and incoming edge lengths. This function is analogous to the function read.tree.nodes in the R package phybase.

Usage

get_clade_list(tree, postorder=FALSE, missing_value=NA)

Arguments

tree

A tree of class "phylo". If postorder==TRUE, then the tree must be rooted.

postorder

Logical, specifying whether nodes should be ordered and indexed in postorder traversal, i.e. with the root node listed last. Note that regardless of the value of postorder, tips will always be listed first and indexed in the order in which they are listed in the input tree.

missing_value

Value to be used to denote missing information in the returned arrays, for example to denote the (non-existing) parent of the root node.

Details

This function is analogous to the function read.tree.nodes in the R package phybase v1.4, but becomes multiple orders of magnitude faster than the latter for large trees (i.e. with 1000-1000,000 tips). Specifically, calling get_clade_list with postorder=TRUE and missing_value=-9 on a bifurcating tree yields a similar behavior as calling read.tree.nodes with the argument “name” set to the tree's tip labels.

The input tree can include monofurcations, bifurcations and multifurcations. The asymptotic average time complexity of this function is O(Nedges), where Nedges is the number of edges in the tree.

Value

A named list with the following elements:

success

Logical, indicating whether model fitting succeeded. If FALSE, the returned list will include an additional “error” element (character) providing a description of the error; in that case all other return variables may be undefined.

Nsplits

The maximum number of children of any node in the tree. For strictly bifurcating trees this will be 2.

clades

2D integer matrix of size Nclades x (Nsplits+1), with every row representing a specific tip/node in the tree. If postorder==FALSE, then rows are in the same order as tips/nodes in the original tree, otherwise nodes (but not tips) will be re-ordered and re-indexed in postorder fashion, with the root being the last row. The first column lists the parent node index, the remaining columns list the child tip/node indices. For the root, the parent index will be set to missing_value; for the tips, the child indices will be set to missing_value. For nodes with fewer than Nsplits children, superfluous column entries will also be missing_value.

lengths

Numeric vector of size Nclades, listing the lengths of the incoming edges at each tip/node in clades. For the root, the value will be missing_value. If the tree's edge_length was NULL, then lengths will be NULL as well.

old2new_clade

Integer vector of size Nclades, mapping old tip/node indices to tip/node indices in the returned clades and lengths arrays. If postorder==FALSE, this will simply be c(1:Nclades).

Author(s)

Stilianos Louca

Examples

# generate a random bifurcating tree
tree = generate_random_tree(list(birth_rate_intercept=1),
                            max_tips=100)$tree

# get tree structure as clade list
# then convert into a similar format as would be
# returned by phybase::read.tree.nodes v1.4
results = get_clade_list(tree,postorder=TRUE,missing_value=-9)
nodematrix = cbind( results$clades, 
                    results$lengths, 
                    matrix(-9,nrow=nrow(results$clades),ncol=3))
phybaseformat = list(   nodes = nodematrix, 
                        names = tree$tip.label, 
                        root  = TRUE)

[Package castor version 1.7.0 Index]