frame {debar} | R Documentation |
Take a DNAseq object and isolate the COI-5P region.
Description
Raw DNA sequence is taken and compared against the nucleotide profile hidden Markov model (PHMM), generating the statistical information later used to apply corrections to the sequence read. The speed of this function and how it operates are dependent on the paramaters chosen. When dir_check == TRUE the function will compare the forward and reverse compliment against the PHMM. Since the PHMM comparison is a computationally intensive process, this option should be set to FALSE if you're data are all forward reads.
Usage
frame(x, ...)
## S3 method for class 'DNAseq'
frame(x, ..., dir_check = TRUE, double_pass = TRUE,
min_match = 100, max_inserts = 400, terminate_rejects = TRUE)
Arguments
x |
a DNAseq class object. |
... |
additional arguments to be passed between methods. |
dir_check |
A boolean indicating if both the forward and reverse compliments of a sequence should be checked against the PHMM. Default is TRUE. |
double_pass |
A boolean indicating if a second pass through the Viterbi algorithm should be conducted for sequences that had leading nucleotides not matching the PHMM. This improves the accurate establishment of reading frame and will reduce false rejections by the amino acid check, but this comes at a cost of additional processing time. Default is TRUE. |
min_match |
The minimum number of sequential matches to the PHMM for a sequence to be denoised. Otherwise flag the sequence as a reject. |
max_inserts |
The maximum number of sequention insert states occuring in a sequence (including the flanking regions). If this number is exceeded than the entire read will be labelled for rejection. Default is 400. |
terminate_rejects |
Should a check be made to enusre minimum homology of the input sequence to the PHMM. Makes sure the match conditions are met prior to continuing with the framing of the sequence. If conditions not met then the function is stopped and the sequence labelled for rejection. |
Details
The min match and max inserts provide the minimal amount of matching to the PHMM requried for a sequence to not be flagged for rejection. If you are dealing with sequence fragments much shorter than the entire COI-5P region you should lower these values.
Value
a class object of code "DNAseq"
See Also
Examples
#previously called
hi_phred = paste0(rep("~~~", 217), collapse="")
dat1 = DNAseq(example_nt_string_errors, name = "err_seq1", phred = hi_phred)
dat1= frame(dat1)
dat1$frame_dat # a labelled list with framing information is produced
dat1$reject == FALSE # the match criteria were met, not labelled for rejection
dat1$data$path #one can call and view the path diagram produced by the PHMM comparison.