weight {aphid} | R Documentation |
Sequence weighting.
Description
Weighting schemes for DNA and amino acid sequences.
Usage
weight(x, ...)
## S3 method for class 'DNAbin'
weight(x, method = "Henikoff", k = 5, ...)
## S3 method for class 'AAbin'
weight(x, method = "Henikoff", k = 5, ...)
## S3 method for class 'list'
weight(x, method = "Henikoff", k = 5, residues = NULL, gap = "-", ...)
## S3 method for class 'dendrogram'
weight(x, method = "Gerstein", ...)
## Default S3 method:
weight(x, method = "Henikoff", k = 5, residues = NULL, gap = "-", ...)
Arguments
x |
a list or matrix of sequences
(usually a "DNAbin" or "AAbin" object).
Alternatively x can be an object of class |
... |
additional arguments to be passed between methods. |
method |
a character string indicating the weighting method to be used.
Currently the only methods available are a modified version of the
maximum entropy weighting scheme proposed by
Henikoff and Henikoff (1994) ( |
k |
integer representing the k-mer size to be used. Defaults to 5. Note that higher values of k may be slow to compute and use excessive memory due to the large numbers of calculations required. |
residues |
either NULL (default; emitted residues are automatically
detected from the sequences), a case sensitive character vector
specifying the residue alphabet, or one of the character strings
"RNA", "DNA", "AA", "AMINO". Note that the default option can be slow for
large lists of character vectors. Furthermore, the default setting
|
gap |
the character used to represent gaps in the alignment matrix
(if applicable). Ignored for |
Details
This is a generic function.
If method = "Henikoff"
the sequences are weighted
using a modified version of the maximum entropy method proposed by
Henikoff and Henikoff (1994). In this case the
maximum entropy weights are calculated from a k-mer presence absence
matrix instead of an alignment as originally described by
Henikoff and Henikoff (1994).
If method = "Gerstein"
the agglomerative method of
Gerstein et al (1994) is used to weight sequences based
on their relatedness as derived from a phylogenetic tree.
In this case a dendrogram is first derived using the
cluster
function in the
kmer
package.
Methods are available for
"dendrogram"
objects, "DNAbin"
and "AAbin"
sequence objects (as lists or matrices) and sequences in standard
character format provided either as lists or matrices.
For further details on sequence weighting schemes see Durbin et al (1998) chapter 5.8.
Value
a named vector of weights, the sum of which is equal to the total number of sequences (average weight = 1).
Author(s)
Shaun Wilkinson
References
Durbin R, Eddy SR, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge, United Kingdom.
Gerstein M, Sonnhammer ELL, Chothia C (1994) Volume changes in protein evolution. Journal of Molecular Biology, 236, 1067-1078.
Henikoff S, Henikoff JG (1994) Position-based sequence weights. Journal of Molecular Biology, 243, 574-578.
Examples
## weight the sequences in the woodmouse dataset from the ape package
library(ape)
data(woodmouse)
woodmouse.weights <- weight(woodmouse)
woodmouse.weights