R: Gene cluster weighting

geneWeights {micropan}

R Documentation

Gene cluster weighting

Description

This function computes weights for gene cluster according to their distribution in a pan-genome.

Usage

geneWeights(pan.matrix, type = c("shell", "cloud"))

Arguments

`pan.matrix`	A pan-matrix, see `panMatrix` for details.
`type`	A text indicating the weighting strategy.

Details

When computing distances between genomes or a PCA, it is possible to give weights to the different gene clusters, emphasizing certain aspects.

As proposed by Snipen & Ussery (2010), we have implemented two types of weighting: The default ‘⁠"shell"⁠’ type means gene families occuring frequently in the genomes, denoted shell-genes, are given large weight (close to 1) while those occurring rarely are given small weight (close to 0). The opposite is the ‘⁠"cloud"⁠’ type of weighting. Genes observed in a minority of the genomes are referred to as cloud-genes. Presumeably, the ‘⁠"shell"⁠’ weighting will give distances/PCA reflecting a more long-term evolution, since emphasis is put on genes who have just barely diverged away from the core. The ‘⁠"cloud"⁠’ weighting emphasizes those gene clusters seen rarely. Genomes with similar patterns among these genes may have common recent history. A ‘⁠"cloud"⁠’ weighting typically gives a more erratic or ‘noisy’ picture than the ‘⁠"shell"⁠’ weighting.

Value

A vector of weights, one for each column in pan.matrix.

Author(s)

Lars Snipen and Kristian Hovde Liland.

References

Snipen, L., Ussery, D.W. (2010). Standard operating procedure for computing pangenome trees. Standards in Genomic Sciences, 2:135-141.

Examples

# See examples for distManhattan

[Package micropan version 2.1 Index]