R: Computing genomic fluidity for a pan-genome

fluidity {micropan}

R Documentation

Computing genomic fluidity for a pan-genome

Description

Computes the genomic fluidity, which is a measure of population diversity.

Usage

fluidity(pan.matrix, n.sim = 10)

Arguments

`pan.matrix`	A pan-matrix, see `panMatrix` for details.
`n.sim`	An integer specifying the number of random samples to use in the computations.

Details

The genomic fluidity between two genomes is defined as the number of unique gene families divided by the total number of gene families (Kislyuk et al, 2011). This is averaged over ‘⁠n.sim⁠’ random pairs of genomes to obtain a population estimate.

The genomic fluidity between two genomes describes their degree of overlap with respect to gene cluster content. If the fluidity is 0.0, the two genomes contain identical gene clusters. If it is 1.0 the two genomes are non-overlapping. The difference between a Jaccard distance (see distJaccard) and genomic fluidity is small, they both measure overlap between genomes, but fluidity is computed for the population by averaging over many pairs, while Jaccard distances are computed for every pair. Note that only presence/absence of gene clusters are considered, not multiple occurrences.

The input ‘⁠pan.matrix⁠’ is typically constructed by panMatrix.

Value

A vector with two elements, the mean fluidity and its sample standard deviation over the ‘⁠n.sim⁠’ computed values.

Author(s)

Lars Snipen and Kristian Hovde Liland.

References

Kislyuk, A.O., Haegeman, B., Bergman, N.H., Weitz, J.S. (2011). Genomic fluidity: an integrative view of gene diversity within microbial populations. BMC Genomics, 12:32.

Examples

# Loading a pan-matrix in this package
data(xmpl.panmat)

# Fluidity based on this pan-matrix
fluid <- fluidity(xmpl.panmat)

[Package micropan version 2.1 Index]