heaps {micropan} | R Documentation |
Heaps law estimate
Description
Estimating if a pan-genome is open or closed based on a Heaps law model.
Usage
heaps(pan.matrix, n.perm = 100)
Arguments
pan.matrix |
A pan-matrix, see |
n.perm |
The number of random permutations of genome ordering. |
Details
An open pan-genome means there will always be new gene clusters observed as long as new genomes are being sequenced. This may sound controversial, but in a pragmatic view, an open pan-genome indicates that the number of new gene clusters to be observed in future genomes is ‘large’ (but not literally infinite). Opposite, a closed pan-genome indicates we are approaching the end of new gene clusters.
This function is based on a Heaps law approach suggested by Tettelin et al (2008). The Heaps law model is fitted to the number of new gene clusters observed when genomes are ordered in a random way. The model has two parameters, an intercept and a decay parameter called ‘alpha’. If ‘alpha>1.0’ the pan-genome is closed, if ‘alpha<1.0’ it is open.
The number of permutations, ‘n.perm’, should be as large as possible, limited by computation time. The default value of 100 is certainly a minimum.
Word of caution: The Heaps law assumes independent sampling. If some of the genomes in the data set form distinct sub-groups in the population, this may affect the results of this analysis severely.
Value
A vector of two estimated parameters: The ‘Intercept’ and the decay parameter ‘alpha’. If ‘alpha<1.0’ the pan-genome is open, if ‘alpha>1.0’ it is closed.
Author(s)
Lars Snipen and Kristian Hovde Liland.
References
Tettelin, H., Riley, D., Cattuto, C., Medini, D. (2008). Comparative genomics: the bacterial pan-genome. Current Opinions in Microbiology, 12:472-477.
See Also
binomixEstimate
, chao
, rarefaction
.
Examples
# Loading a pan-matrix in this package
data(xmpl.panmat)
# Estimating population openness
h.est <- heaps(xmpl.panmat, n.perm = 500)
print(h.est)
# If alpha < 1 it indicates an open pan-genome