R: Generate random sequences from a model.

generate {aphid}

R Documentation

Generate random sequences from a model.

Description

The generate function outputs a random sequence from a HMM or PHMM.

Usage

generate(x, size, ...)

## S3 method for class 'HMM'
generate(x, size, logspace = "autodetect", random = TRUE, ...)

## S3 method for class 'PHMM'
generate(
  x,
  size,
  logspace = "autodetect",
  gap = "-",
  random = TRUE,
  DNA = FALSE,
  AA = FALSE,
  ...
)

Arguments

`x`	an object of class `'HMM'` or `'PHMM'`.
`size`	a non-negative integer representing the length of the output sequence if x is a `"HMM"` object with zero probability of transitioning to the begin/end state, or the maximum length of the output sequence otherwise (this acts as a safeguard against overflow).
`...`	additional arguments to be passed between methods.
`logspace`	logical indicating whether the emission and transition probabilities of x are logged. If `logspace = "autodetect"` (the default setting), the function will automatically detect if the probabilities are logged, returning an error if inconsistencies are found. Note that choosing the latter option increases the computational overhead; therefore specifying `TRUE` or `FALSE` can reduce the running time.
`random`	logical indicating whether residues should be emitted randomly with probabilities defined by the emission probabilities in the model (TRUE; default), or deterministically, whereby each residue is emitted and each transition taken based on the maximum emission/transition probability in the current state.
`gap`	the character used to represent gaps (delete states) in the output sequence (only applicable for `PHMM` objects).
`DNA`	logical indicating whether the returned sequence should be a `"DNAbin"` object. Only applicable if the matrix of emission probabilities in the model has four residues corresponding to the nucleotide alphabet (A, T, G, and C).
`AA`	logical indicating whether the returned sequence should be a `"AAbin"` object. Only applicable if the matrix of emission probabilities in the model has 20 residues corresponding to the amino acid alphabet.

Details

This simple function generates a single sequence from a HMM or profile HMM by recursively simulating a path through the model. The function is fairly slow in its current state, but a faster C++ function may be made available in a future version depending on demand.

Value

a named vector giving the sequence of residues emitted by the model, with the "names" attribute representing the hidden states.

Author(s)

Shaun Wilkinson

References

Durbin R, Eddy SR, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge, United Kingdom.

Examples

  ## Generate a random sequence from a standard HMM
  ## The dishonest casino example from Durbin et al (1998) chapter 3.2
  states <- c("Begin", "Fair", "Loaded")
  residues <- paste(1:6)
  ### Define the transition probability matrix
  A <- matrix(c(0, 0, 0, 0.99, 0.95, 0.1, 0.01, 0.05, 0.9), nrow = 3)
  dimnames(A) <- list(from = states, to = states)
  ### Define the emission probability matrix
  E <- matrix(c(rep(1/6, 6), rep(1/10, 5), 1/2), nrow = 2, byrow = TRUE)
  dimnames(E) <- list(states = states[-1], residues = residues)
  ### Build and plot the HMM object
  x <- structure(list(A = A, E = E), class = "HMM")
  plot(x, main = "Dishonest casino HMM")
  ### Generate a random sequence from the model
  generate(x, size = 300)
  ##
  ## Generate a random sequence from a profile HMM:
  ## Small globin alignment data from Durbin et al (1998) Figure 5.3
  data(globins)
  ### Derive a profile hidden Markov model from the alignment
  globins.PHMM <- derivePHMM(globins, residues = "AMINO", seqweights = NULL)
  plot(globins.PHMM, main = "Profile hidden Markov model for globins")
  ### Simulate a random sequence from the model
  suppressWarnings(RNGversion("3.5.0"))
  set.seed(999)
  simulation <- generate(globins.PHMM, size = 20)
  simulation ## "F" "S" "A" "N" "N" "D" "W" "E"
  ### Names attribute indicates that all residues came from "match" states

[Package aphid version 1.3.5 Index]