dada_implement {DBTC}R Documentation

Dada Implement

Description

This function requires a main directory containing a folder(s) representing sequencing runs which in-turn contain fastq files (the location of one of the fastq files in one of the sequencing run folders is used as an input argument). All sequencing folders in the main directory need to represent data from sequencing runs that have used the same primers and protocols. Output from this function includes all processing files and final main output files in the form of fasta files and amplicon sequencing variant (ASV) tables.

Usage

dada_implement(
  runFolderLoc = NULL,
  primerFile = NULL,
  fwdIdent = "_R1_001",
  revIdent = "_R2_001",
  unidirectional = FALSE,
  bidirectional = TRUE,
  printQualityPdf = TRUE,
  maxPrimeMis = 2,
  fwdTrimLen = 0,
  revTrimLen = 0,
  maxEEVal = 2,
  truncQValue = 2,
  truncLenValueF = 0,
  truncLenValueR = 0,
  error = 0.1,
  nbases = 1e+80,
  maxMismatchValue = 0,
  minOverlapValue = 12,
  trimOverhang = FALSE,
  minFinalSeqLen = 100,
  verbose = TRUE
)

Arguments

runFolderLoc

Select a file in the one of the run folders with the fastq files of interest (Default NULL).

primerFile

Select a file with the primers for this analysis (Default NULL).

fwdIdent

Forward identifier naming string (Default '_R1_001').

revIdent

Reverse identifier naming string (Default '_R2_001').

unidirectional

Selection to process files independently (Default FALSE).

bidirectional

Selection to process paired forward and reverse sequence for analysis (Default TRUE).

printQualityPdf

Selection to process save image files showing quality metrics (Default TRUE).

maxPrimeMis

Maximum number of mismatches allowed when pattern matching trimming the primers from the ends of the reads for the ShortRead trimLRPatterns() function (Default 2).

fwdTrimLen

Select a forward trim length for the Dada filterAndTrim() function (Default 0).

revTrimLen

Select a reverse trim length for the Dada filterAndTrim() function (Default 0).

maxEEVal

Maximum number of expected errors allowed in a read for the Dada filterAndTrim() function (Default 2).

truncQValue

Truncation value use to trim ends of reads, nucleotides with quality values less than this value will be used to trim the remainder of the reads for the Dada filterAndTrim() function (Default 2).

truncLenValueF

Dada forward length trim value for the Dada filterAndTrim() function. This function is set to 0 when the pattern matching trim function is enabled (Default 0).

truncLenValueR

Dada reverse length trim value for the Dada filterAndTrim() function. This function is set to 0 when the pattern matching trim function is enabled (Default 0).

error

Percent of fastq files used to assess error rates for the Dada learnErrors() function (Default 0.1).

nbases

The total number of bases used to assess errors for the Dada learnErrors() function (Default 1e80) NOTE: this value is set very high to get all nucleotides in the error present file subset. If the error is to be assessed using total reads and not specific fastq files then set the error to 1 and set this value to the desired number of reads.

maxMismatchValue

Maximum number of mismatches allowed when merging two reads for the Dada mergePairs() function (Default 2).

minOverlapValue

Minimum number of overlapping nucleotides for the forward and reverse reads for the Dada mergePairs() function (Default 12).

trimOverhang

Trim merged reads past the start of the complimentary primer regions for the Dada mergePairs() function (Default FALSE).

minFinalSeqLen

The minimum final desired length of the read (Default 100).

verbose

If set to TRUE then there will be output to the R console, if FALSE then this reporting data is suppressed (Default TRUE).

Details

Two file types are required as input for the dada_implement() function. The first are the fastq files in the appropriate folder structure (see below) and the second is a file containing the primers used for the amplification of the sequence reads.

Fastq File Folder Structure

Parent Directory | | —————– | | | | Run1 Directory Run2 Directory -Fastq -Fastq -Fastq -Fastq ... ...

Format of the primer file

| Forward | Reverse | | AGTGTGTAGTGATTG | CGCATCGCTCAGACTGACTGC | | GAGCCCTCGATCGCT | GGTCGATAGCTACGCGCGCATACGACT | | | GGTTCACATCGCATTCAT |

The examples are present to display the syntax for the function. These examples are not run because there are files required to run the functions, in some cases multiple files are necessary and some of these are quite large. To get specific examples please see https://github.com/rgyoung6/DBTCShinyTutorial/blob/main/README.md

Value

The output from this function includes four folders. A_Qual - Contains quality pdf files for the input fastq files (if printQualityPdf set to TRUE). B_Filt - Contains dada filtered fastq files and a folder with the end trimmed fastq files before quality filtering. C_FiltQual - Contains quality pdf files for the filtered fastq files (if printQualityPdf set to TRUE). D_Output - This folder contains output files including and analysis summary, an analysis summary table of processing values, forward and reverse error assessments, and finally the output ASV and fasta files of obtained sequences. -TotalTable.tsv

Note

WARNING - NO WHITESPACE!

When running DBTC functions the paths for the files selected cannot have white space! File folder locations should be as short as possible (close to the root as some functions do not process long naming conventions.

Also, special characters should be avoided (including question mark, number sign, exclamation mark). It is recommended that dashes be used for separations in naming conventions while retaining underscores for use as information delimiters (this is how DBTC functions use underscore).

There are several key character strings used in the DBTC pipeline, the presence of these strings in file or folder names will cause errors when running DBTC functions.

The following strings are those used in DBTC and should not be used in file or folder naming: - _BLAST - _combinedDada - _taxaAssign - _taxaAssignCombined - _taxaReduced - _CombineTaxaReduced

Author(s)

Robert G. Young

References

<https://github.com/rgyoung6/DBTC> Young, R. G., Hanner, R. H. (Submitted October 2023). Dada-BLAST-Taxon Assign-Condense Shiny Application (DBTCShiny). Biodiversity Data Journal.

See Also

combine_dada_output() make_BLAST_DB() seq_BLAST() taxon_assign() combine_assign_output() reduce_taxa() combine_reduced_output()

Examples

## Not run: 
dada_implement()
dada_implement(runFolderLoc = NULL, primerFile = NULL,fwdIdent = "_R1_001",
revIdent = "_R2_001",unidirectional = FALSE, bidirectional = TRUE, printQualityPdf = TRUE,
maxPrimeMis = 2, fwdTrimLen = 0, revTrimLen = 0,maxEEVal=2, truncQValue = 2,
truncLenValueF = 0, truncLenValueR = 0,error = 0.1, nbases = 1e80,
maxMismatchValue = 0, minOverlapValue = 12,trimOverhang = FALSE,
minFinalSeqLen = 100)

## End(Not run)


[Package DBTC version 0.1.0 Index]