encoding {insect} | R Documentation |
Encode and decode profile HMMs in raw byte format.
Description
These functions are used to compress and decompress profile hidden Markov models for DNA to improve memory efficiency.
Usage
encodePHMM(x)
decodePHMM(z)
Arguments
x |
an object of class "PHMM" |
z |
a raw vector in the encodePHMM schema. |
Details
Profile HMMs used in tree-based classification usually include many parameters, and hence large trees with many PHMMs can occupy a lot of memory. Hence a basic encoding system was devised to store the emission and transition probabilities in raw-byte format to three (nearly four) decimal places. This does not seem to significantly affect the accuracy of likelihood scoring, and has a moderate impact on classification speed, but can reduce the memory allocation requirements for large trees by up to 95 percent.
Value
encodePHMM returns a raw vector. decodePHMM
returns
an object of class "PHMM" (see Durbin et al (1998) and
the aphid
package for more details
on profile hidden Markov models).
Author(s)
Shaun Wilkinson
References
Durbin R, Eddy SR, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge, United Kingdom.
Examples
## generate a simple classification tree with two child nodes
data(whales)
data(whale_taxonomy)
tree <- learn(whales, db = whale_taxonomy, recursive = FALSE)
## extract the omnibus profile HMM from the root node
PHMM0 <- decodePHMM(attr(tree, "model"))
## extract the profile HMM from the first child node
PHMM1 <- decodePHMM(attr(tree[[1]], "model"))