pick_random_tips {castor} | R Documentation |
Given a rooted phylogenetic tree, this function picks random subsets of tips by traversing the tree from root to tips, choosing a random child at each node until reaching a tip. Multiple random independent subsets can be generated if needed.
pick_random_tips( tree, size = 1, Nsubsets = 1, with_replacement = TRUE, drop_dims = TRUE)
tree |
A rooted tree of class "phylo". The root is assumed to be the unique node with no incoming edge. |
size |
Integer. The size of each random subset of tips. |
Nsubsets |
Integer. Number of independent subsets to pick. |
with_replacement |
Logical. If |
drop_dims |
Logical, specifying whether to return a vector (instead of a matrix) if |
If with_replacement==TRUE
, then each child of a node is equally probable to be traversed and each tip can be included multiple times in a subset. If with_replacement==FALSE
, then only children with at least one descending tip not included in the subset remain available for traversal; each available child of a node has equal probability to be traversed. In any case, it is always possible for separate subsets to include the same tips.
This random sampling algorithm differs from a uniform sampling of tips at equal probabilities; instead, this algorithm ensures that sister clades have equal probabilities to be picked (if with_replacement==TRUE
or if size
<<Ntips).
The time required by this function per random subset decreases with the number of subsets requested.
A 2D integer matrix of size Nsubsets x size, with each row containing indices of randomly picked tips (i.e. in 1,..,Ntips) within a specific subset. If drop_dims==TRUE
and Nsubsets==1
, then a vector is returned instead of a matrix.
Stilianos Louca
# generate random tree Ntips = 1000 tree = generate_random_tree(list(birth_rate_intercept=1),max_tips=Ntips)$tree # pick random tip subsets Nsubsets = 100 size = 50 subsets = pick_random_tips(tree, size, Nsubsets, with_replacement=FALSE) # count the number of times each tip was picked in a subset ("popularity") popularities = table(subsets) # plot histogram of tip popularities hist(popularities,breaks=20,xlab="popularity",ylab="# tips",main="tip popularities")