pick_random_tips {castor} | R Documentation |
Pick random subsets of tips on a tree.
Description
Given a rooted phylogenetic tree, this function picks random subsets of tips by traversing the tree from root to tips, choosing a random child at each node until reaching a tip. Multiple random independent subsets can be generated if needed.
Usage
pick_random_tips( tree,
size = 1,
Nsubsets = 1,
with_replacement = TRUE,
drop_dims = TRUE)
Arguments
tree |
A rooted tree of class "phylo". The root is assumed to be the unique node with no incoming edge. |
size |
Integer. The size of each random subset of tips. |
Nsubsets |
Integer. Number of independent subsets to pick. |
with_replacement |
Logical. If |
drop_dims |
Logical, specifying whether to return a vector (instead of a matrix) if |
Details
If with_replacement==TRUE
, then each child of a node is equally probable to be traversed and each tip can be included multiple times in a subset. If with_replacement==FALSE
, then only children with at least one descending tip not included in the subset remain available for traversal; each available child of a node has equal probability to be traversed. In any case, it is always possible for separate subsets to include the same tips.
This random sampling algorithm differs from a uniform sampling of tips at equal probabilities; instead, this algorithm ensures that sister clades have equal probabilities to be picked (if with_replacement==TRUE
or if size
<<Ntips).
The time required by this function per random subset decreases with the number of subsets requested.
Value
A 2D integer matrix of size Nsubsets x size, with each row containing indices of randomly picked tips (i.e. in 1,..,Ntips) within a specific subset. If drop_dims==TRUE
and Nsubsets==1
, then a vector is returned instead of a matrix.
Author(s)
Stilianos Louca
Examples
# generate random tree
Ntips = 1000
tree = generate_random_tree(list(birth_rate_intercept=1),max_tips=Ntips)$tree
# pick random tip subsets
Nsubsets = 100
size = 50
subsets = pick_random_tips(tree, size, Nsubsets, with_replacement=FALSE)
# count the number of times each tip was picked in a subset ("popularity")
popularities = table(subsets)
# plot histogram of tip popularities
hist(popularities,breaks=20,xlab="popularity",ylab="# tips",main="tip popularities")