R: Pick random subsets of tips on a tree.

pick_random_tips {castor}

R Documentation

Pick random subsets of tips on a tree.

Description

Given a rooted phylogenetic tree, this function picks random subsets of tips by traversing the tree from root to tips, choosing a random child at each node until reaching a tip. Multiple random independent subsets can be generated if needed.

Usage

pick_random_tips( tree, 
                  size              = 1, 
                  Nsubsets          = 1, 
                  with_replacement  = TRUE, 
                  drop_dims         = TRUE)

Arguments

`tree`	A rooted tree of class "phylo". The root is assumed to be the unique node with no incoming edge.
`size`	Integer. The size of each random subset of tips.
`Nsubsets`	Integer. Number of independent subsets to pick.
`with_replacement`	Logical. If `TRUE`, each tip can be picked multiple times within a subset (i.e. are "replaced" in the urn). If `FALSE`, tips are picked without replacement in each subset. In that case, `size` must not be greater than the number of tips in the tree.
`drop_dims`	Logical, specifying whether to return a vector (instead of a matrix) if `Nsubsets==1`.

Details

If with_replacement==TRUE, then each child of a node is equally probable to be traversed and each tip can be included multiple times in a subset. If with_replacement==FALSE, then only children with at least one descending tip not included in the subset remain available for traversal; each available child of a node has equal probability to be traversed. In any case, it is always possible for separate subsets to include the same tips.

This random sampling algorithm differs from a uniform sampling of tips at equal probabilities; instead, this algorithm ensures that sister clades have equal probabilities to be picked (if with_replacement==TRUE or if size<<Ntips).

The time required by this function per random subset decreases with the number of subsets requested.

Value

A 2D integer matrix of size Nsubsets x size, with each row containing indices of randomly picked tips (i.e. in 1,..,Ntips) within a specific subset. If drop_dims==TRUE and Nsubsets==1, then a vector is returned instead of a matrix.

Author(s)

Stilianos Louca

Examples

# generate random tree
Ntips = 1000
tree  = generate_random_tree(list(birth_rate_intercept=1),max_tips=Ntips)$tree

# pick random tip subsets
Nsubsets = 100
size     = 50
subsets = pick_random_tips(tree, size, Nsubsets, with_replacement=FALSE)

# count the number of times each tip was picked in a subset ("popularity")
popularities = table(subsets)

# plot histogram of tip popularities
hist(popularities,breaks=20,xlab="popularity",ylab="# tips",main="tip popularities")

[Package castor version 1.8.2 Index]