forward {aphid} | R Documentation |
The forward algorithm.
Description
This function calculates the full (log) probability or odds of a sequence given a hidden Markov model or profile HMM using the forward dynamic programming algorithm.
Usage
forward(x, y, ...)
## S3 method for class 'PHMM'
forward(
x,
y,
qe = NULL,
logspace = "autodetect",
odds = TRUE,
windowspace = "all",
DI = FALSE,
ID = FALSE,
cpp = TRUE,
...
)
## S3 method for class 'HMM'
forward(x, y, logspace = "autodetect", cpp = TRUE, ...)
Arguments
x |
an object of class |
y |
a vector of mode "character" or "raw" (a "DNAbin" or "AAbin"
object) representing a single sequence hypothetically emitted by
the model in |
... |
additional arguments to be passed between methods. |
qe |
an optional named vector of background residue frequencies (only
applicable if x is a PHMM). If |
logspace |
logical indicating whether the emission and transition
probabilities of x are logged. If |
odds |
logical, indicates whether the returned scores should be odds ratios (TRUE) or full logged probabilities (FALSE). |
windowspace |
a two-element integer vector providing the search space for dynamic programming (see Wilbur & Lipman 1983 for details). The first element should be negative, and represent the lowermost diagonal of the dynammic programming array, and the second element should be positive, representing the leftmost diagonal. Alternatively, if the the character string "all" is passed (the default setting) the entire dynamic programming array will be computed. |
DI |
logical indicating whether delete-insert transitions should be allowed in the profile hidden Markov model (if applicable). Defaults to FALSE. |
ID |
logical indicating whether insert-delete transitions should be allowed in the profile hidden Markov model (if applicable). Defaults to FALSE. |
cpp |
logical, indicates whether the dynamic programming matrix should be filled using compiled C++ functions (default; many times faster). The FALSE option is primarily retained for bug fixing and experimentation. |
Details
This function is a wrapper for a compiled C++ function that recursively fills a dynamic programming matrix with logged probabilities, and calculates the full (logged) probability of a sequence given a HMM or PHMM.
For a thorough explanation of the backward, forward and Viterbi algorithms, see Durbin et al (1998) chapters 3.2 (HMMs) and 5.4 (PHMMs).
Value
an object of class "DPA"
, which is a list
containing the score and dynamic programming array.
Author(s)
Shaun Wilkinson
References
Durbin R, Eddy SR, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge, United Kingdom.
Wilbur WJ, Lipman DJ (1983) Rapid similarity searches of nucleic acid and protein data banks. Proc Natl Acad Sci USA, 80, 726-730.
See Also
Examples
## Forward algorithm for standard HMMs:
## The dishonest casino example from Durbin et al (1998) chapter 3.2
states <- c("Begin", "Fair", "Loaded")
residues <- paste(1:6)
### Define the transition probability matrix
A <- matrix(c(0, 0, 0, 0.99, 0.95, 0.1, 0.01, 0.05, 0.9), nrow = 3)
dimnames(A) <- list(from = states, to = states)
### Define the emission probability matrix
E <- matrix(c(rep(1/6, 6), rep(1/10, 5), 1/2), nrow = 2, byrow = TRUE)
dimnames(E) <- list(states = states[-1], residues = residues)
### Build and plot the HMM object
x <- structure(list(A = A, E = E), class = "HMM")
plot(x, main = "Dishonest casino HMM")
### Find full probability of the sequence given the model
data(casino)
forward(x, casino)
###
## Forward algorithm for profile HMMs:
## Small globin alignment data from Durbin et al (1998) Figure 5.3
data(globins)
### Derive a profile HMM from the alignment
globins.PHMM <- derivePHMM(globins, residues = "AMINO", seqweights = NULL)
plot(globins.PHMM, main = "Profile hidden Markov model for globins")
### Simulate a random sequence from the model
suppressWarnings(RNGversion("3.5.0"))
set.seed(999)
simulation <- generate(globins.PHMM, size = 20)
simulation ## "F" "S" "A" "N" "N" "D" "W" "E"
### Calculate the full (log) probability of the sequence given the model
x <- forward(globins.PHMM, simulation, odds = FALSE)
x # -23.0586
### Show the dynammic programming array
x$array