coi5p_pipe {coil}R Documentation

Run the entire coi5p pipeline for an input sequence.

Description

This function will take a raw DNA sequence string and run each of the coi5p methods in turn (coi5p, frame, translate, indel_check). Note that if you are not interested in all components of the output (i.e. only want sequences set in frame reading or translated), then the coi5p analysis functions can be called individually to avoid unnecessary computation.

Usage

coi5p_pipe(
  x,
  ...,
  name = character(),
  trans_table = 0,
  frame_offset = 0,
  triple_translate = FALSE,
  nt_PHMM = coil::nt_coi_PHMM,
  aa_PHMM = coil::aa_coi_PHMM,
  indel_threshold = -358.88
)

Arguments

x

A nucleotide string. Valid characters within the nucleotide string are: 'a', 't', 'g', 'c', '-', and 'n'. The nucleotide string can be input as upper case, but will be automatically converted to lower case.

...

Additional arguments to be passed between methods.

name

An optional character string. Identifier for the sequence.

trans_table

The translation table to use for translating from nucleotides to amino acids. Default is 0, which indicates that censored translation should be performed. If the taxonomy of the sample is known, use the function which_trans_table() to determine the translation table to use.

frame_offset

The offset to the reading frame to be applied for translation. By default the offset is zero, so the first character in the framed sequence is considered the first nucleotide of the first codon. Passing frame_offset = 1 would make the second character in the framed sequence the first nucleotide of the first codon.

triple_translate

Optional argument indicating if the translation of sequences should be tested in all three forward reading frames. The reading frame with the most likely amino acid PHMM score is returned. This will decrease the rate of sequencing framing errors, at the cost of increased processing time. Note this argument will overrule any passed frame_offset value (all options tried). Default is False.

nt_PHMM

The profile hidden Markov model against which the raw sequence should be compared in the framing step. Default is the full COI-5P nucleotide PHMM (nt_coi_PHMM).

aa_PHMM

The profile hidden Markov model against which the translated amino acid sequence should be compared in the indel_check step. Default is the full COI-5P amino acid PHMM (aa_coi_PHMM).

indel_threshold

The log likelihood threshold used to assess whether or not sequences are likely to contain an indel. Default is -358.88. Values lower than this will be classified as likely to contain an indel and values higher will be classified as not likely to contain an indel. For recommendations on selecting a indel_threshold value, consult: Nugent et al. 2019 (doi: https://doi.org/10.1101/2019.12.12.865014).

Value

An object of class "coi5p"

See Also

coi5p

frame

translate

indel_check

which_trans_table

subsetPHMM

Examples

dat = coi5p_pipe(example_nt_string)
#full coi5p object can then be printed
dat
#components of output coi5p object can be called individually:
dat$raw    #raw input sequence
dat$name   #name that was passed
dat$framed #sequence in common reading frame
dat$aaSeq  #sequence translated to amino acids (censored)
dat$indel_likely #whether an insertion or deletion likely exists in the sequence
dat$stop_codons #whether or not there are stop codons in the amino acid sequence
dat = coi5p_pipe(example_nt_string , trans_table = 5)
dat$aaSeq #sequence translated to amino acids using designated translation table

[Package coil version 1.2.4 Index]