qfilter {insect} | R Documentation |
Quality filtering for amplicon sequences.
Description
This function performs several quality checks for FASTQ input files, removing any sequences that do not conform to the specified quality standards. This includes an average quality score assessment, size selection, singleton removal (or an alternative minimum count) and ambiguous base-call filtering.
Usage
qfilter(
x,
minqual = 30,
maxambigs = 0,
mincount = 2,
minlength = 50,
maxlength = 500
)
Arguments
x |
a vector of concatenated strings representing DNA sequences
(in upper case) or a DNAbin list object with quality attributes.
This argument will usually be produced by |
minqual |
integer, the minimum average quality score for a sequence to pass the filter. Defaults to 30. |
maxambigs |
integer, the maximum number of ambiguities for a sequence to pass the filter. Defaults to 0. |
mincount |
integer, the minimum acceptable number of occurrences of a sequence for it to pass the filter. Defaults to 2 (removes singletons). |
minlength |
integer, the minimum acceptable sequence length. Defaults to 50. |
maxlength |
integer, the maximum acceptable sequence length. Defaults to 500. |
Value
an object of the same type as the primary input argument (i.e. a "DNAbin" object if x is a "DNAbin" object, or a vector of concatenated character strings otherwise).
Author(s)
Shaun Wilkinson
Examples
## download and extract example FASTQ file to temporary directory
td <- tempdir()
URL <- "https://www.dropbox.com/s/71ixehy8e51etdd/insect_tutorial1_files.zip?dl=1"
dest <- paste0(td, "/insect_tutorial1_files.zip")
download.file(URL, destfile = dest, mode = "wb")
unzip(dest, exdir = td)
x <- readFASTQ(paste0(td, "/COI_sample2.fastq"))
## trim primers from sequences
mlCOIintF <- "GGWACWGGWTGAACWGTWTAYCCYCC"
jgHCO2198 <- "TAIACYTCIGGRTGICCRAARAAYCA"
x <- trim(x, up = mlCOIintF, down = jgHCO2198)
## filter sequences to remove singletons, low quality & short/long reads
x <- qfilter(x, minlength = 250, maxlength = 350)