pick_random_tips {castor}R Documentation

Pick random subsets of tips on a tree.

Description

Given a rooted phylogenetic tree, this function picks random subsets of tips by traversing the tree from root to tips, choosing a random child at each node until reaching a tip. Multiple random independent subsets can be generated if needed.

Usage

pick_random_tips( tree, 
                  size              = 1, 
                  Nsubsets          = 1, 
                  with_replacement  = TRUE, 
                  drop_dims         = TRUE)

Arguments

tree

A rooted tree of class "phylo". The root is assumed to be the unique node with no incoming edge.

size

Integer. The size of each random subset of tips.

Nsubsets

Integer. Number of independent subsets to pick.

with_replacement

Logical. If TRUE, each tip can be picked multiple times within a subset (i.e. are "replaced" in the urn). If FALSE, tips are picked without replacement in each subset. In that case, size must not be greater than the number of tips in the tree.

drop_dims

Logical, specifying whether to return a vector (instead of a matrix) if Nsubsets==1.

Details

If with_replacement==TRUE, then each child of a node is equally probable to be traversed and each tip can be included multiple times in a subset. If with_replacement==FALSE, then only children with at least one descending tip not included in the subset remain available for traversal; each available child of a node has equal probability to be traversed. In any case, it is always possible for separate subsets to include the same tips.

This random sampling algorithm differs from a uniform sampling of tips at equal probabilities; instead, this algorithm ensures that sister clades have equal probabilities to be picked (if with_replacement==TRUE or if size<<Ntips).

The time required by this function per random subset decreases with the number of subsets requested.

Value

A 2D integer matrix of size Nsubsets x size, with each row containing indices of randomly picked tips (i.e. in 1,..,Ntips) within a specific subset. If drop_dims==TRUE and Nsubsets==1, then a vector is returned instead of a matrix.

Author(s)

Stilianos Louca

Examples

# generate random tree
Ntips = 1000
tree  = generate_random_tree(list(birth_rate_intercept=1),max_tips=Ntips)$tree

# pick random tip subsets
Nsubsets = 100
size     = 50
subsets = pick_random_tips(tree, size, Nsubsets, with_replacement=FALSE)

# count the number of times each tip was picked in a subset ("popularity")
popularities = table(subsets)

# plot histogram of tip popularities
hist(popularities,breaks=20,xlab="popularity",ylab="# tips",main="tip popularities")

[Package castor version 1.8.0 Index]